from:"Ashutosh Bapat"

Re: [HACKERS] GetExistingLocalJoinPath() vs. the docs

2016-03-01 Thread Ashutosh Bapat

I think that you need to take a little broader look at this section.
> At the top, it says "To use any of these functions, you need to
> include the header file foreign/foreign.h in your source file", but
> this function is defined in foreign/fdwapi.h.  It's not clear to me
> whether we should consider moving the prototype, or just document that
> this function is someplace else.  The other functions prototyped in
> fdwapi.h aren't documented at all, except for
> IsImportableForeignTable, which is mentioned in passing.
>
> Further down, the section says "Some object types have name-based
> lookup functions in addition to the OID-based ones:" and you propose
> to put the documentation for this function after that.  But this
> comment doesn't actually describe this particular function.
>

> Actually, this function just doesn't seem to fit into this section at
> all.  It's really quite different from the others listed there.  How
> about something like the attached instead?


Right. Mentioning the function in the description of relevant function
looks better and avoids some duplication.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Issue with NULLS LAST, with postgres_fdw sort pushdown

2016-03-02 Thread Ashutosh Bapat

Thanks Rajkumar for your report. Let me know if the attached patch fixes
the issue.

The code did not add NULL LAST clause the case when pk_nulls_first is false
in pathkey. PFA the fix for the same. I have also added few tests to
postgres_fdw.sql for few combinations of asc/desc and nulls first/last.

On Mon, Feb 29, 2016 at 3:49 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

> Hi,
>
> I am testing postgres_fdw sort pushdown feature for PostgreSQL 9.6 DB, and
> I observed below issue.
>
> *Observation: *If giving nulls last option with the order by clause as
> 'desc nulls last', remote query is not considering nulls last and giving
> wrong result in 9.6 version. while in 9.5 it is giving proper result.
>
> for testing, I have a table "fdw_sort_test" in foreign server for which
> postgres_fdw, foreign table created in local server.
>
>  db2=# select * from fdw_sort_test ;
>  id | name
> +--
>   1 | xyz
>   3 |
>   2 | abc
>   4 | pqr
> (4 rows)
>
>on version 9.6 :
>
>  db1=# select * from fdw_sort_test order by name desc
> nulls last;
>   id | name
>  +--
>3 |
>1 | xyz
>4 | pqr
>2 | abc
>  (4 rows)
>
>  db1=# explain verbose select * from fdw_sort_test
> order by name desc nulls last;
> QUERY
> PLAN
>  --
> --
>   Foreign Scan on public.fdw_sort_test
> (cost=100.00..129.95 rows=561 width=122)
> Output: id, name
> Remote SQL: SELECT id, name FROM
> public.fdw_sort_test ORDER BY name DESC
>  (3 rows)
>
>
> on version 9.5 :
>  db1=# select * from fdw_sort_test order by name desc
> nulls last;
>id | name
>   +--
> 1 | xyz
> 4 | pqr
> 2 | abc
> 3 |
>   (4 rows)
>
>  db1=# explain verbose select * from fdw_sort_test
> order by name desc nulls last;
> QUERY
> PLAN
>  --
> 
>   Sort  (cost=152.44..153.85 rows=561 width=122)
> Output: id, name
> Sort Key: fdw_sort_test.name DESC NULLS LAST
> ->  Foreign Scan on public.fdw_sort_test
> (cost=100.00..126.83 rows=561 width=122)
>   Output: id, name
>   Remote SQL: SELECT id, name FROM
> public.fdw_sort_test
>
> *steps to reproduce : *
>
> --connect to sql
> \c postgres postgres
> --create role and database db1, will act as local server
> create role db1 password 'db1' superuser login;
> create database db1 owner=db1;
> grant all on database db1 to db1;
>
> --create role and database db2, will act as foreign server
> create role db2 password 'db2' superuser login;
> create database db2 owner=db2;
> grant all on database db2 to db2;
>
> --connect to db2 and create a table
> \c db2 db2
> create table fdw_sort_test (id integer, name varchar(50));
> insert into fdw_sort_test values (1,'xyz');
> insert into fdw_sort_test values (3,null);
> insert into fdw_sort_test values (2,'abc');
> insert into fdw_sort_test values (4,'pqr');
>
> --connect to db1 and create postgres_fdw
> \c db1 db1
> create extension postgres_fdw;
> create server db2_link_server foreign data wrapper postgres_fdw options
> (host 'db2_machine_ip', dbname 'db2', port 'db_machine_port_no');
> create user mapping for db1 server db2_link_server options (user 'db2',
> password 'db2');
>
> --create a foreign table
> create foreign table fdw_sort_

Re: [HACKERS] Issue with NULLS LAST, with postgres_fdw sort pushdown

2016-03-02 Thread Ashutosh Bapat

On Thu, Mar 3, 2016 at 7:27 AM, Michael Paquier 
wrote:

> On Wed, Mar 2, 2016 at 7:04 PM, Rajkumar Raghuwanshi
>  wrote:
> > On Wed, Mar 2, 2016 at 2:35 PM, Ashutosh Bapat
> >  wrote:
> >>
> >> Thanks Rajkumar for your report. Let me know if the attached patch fixes
> >> the issue.
>
>  if (pathkey->pk_nulls_first)
>  appendStringInfoString(buf, " NULLS FIRST");
> +else
> +appendStringInfoString(buf, " NULLS LAST");
> Per explain.c, this looks inconsistent to me. Shouldn't NULLS LAST be
> applied only if DESC is used in this ORDER BY clause?
>

I assume that you are referring to show_sortorder_options().

As per PG documentation
http://www.postgresql.org/docs/9.4/static/queries-order.html, "By default,
null values sort as if larger than any non-null value; that is, NULLS FIRST
is the default for DESC order, and NULLS LAST otherwise." What
show_sortorder_options() is doing is just trying to avoid printing the
defaults, which is arguably fine for an explain output; it leaves defaults
to be interpreted by user. In this case we are constructing a query to be
sent to the foreign server and it's better not to leave the defaults to be
interpreted by the foreign server; in case it interprets them in different
fashion. get_rule_orderby() also explicitly adds these options.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Pushing down sorted joins

2016-03-08 Thread Ashutosh Bapat

> This patch needs to be rebased.
>
>
Done.


> +   /*
> +* TODO: we should worry about EPQ path but should
> that path have
> +* pathkeys? I guess, that's not really important
> since it's just
> +* going to evaluate the join from whole-row
> references stuffed in the
> +* corresponding EPQ slots, for which the order doesn't
> matter.
> +*/
>
> The pathkeys for the EPQ path don't matter.  It'll only be called to
> recheck one single row, and there's only one order in which you can
> return one row.
>

Right. Removed the TODO


>
> -   if (bms_equal(em->em_relids, rel->relids))
> +   if (bms_is_subset(em->em_relids, rel->relids))
>
> Why do we need this?
>
>
The function find_em_expr_for_rel() find an equivalence member expression
that has all its Vars come from the given relation. It's not necessary that
it will have Vars from relids that are covered by the given relations. E.g.
in query SELECT A.c1, B.c2 FROM A join B ON ... ORDER BY A.c3, there will
be a single equivalence member A.c3 in the pathkeys and em_relids will
indicate only A. Hence instead of equal, (which used to be OK for single
relation join push-down) we have to use subset operation. We want an
equivalence members whose relids are subset of relids contained by given
relation.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6479640..48bdbef 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -437,20 +437,54 @@ SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1"
  103 | 103
  104 | 104
  105 | 105
  106 | 106
  107 | 107
  108 | 108
  109 | 109
  110 | 110
 (10 rows)
 
+-- A join between local table and foreign join. ORDER BY clause is added to the
+-- foreign join so that the local table can be joined using merge join strategy.
+EXPLAIN (COSTS false, VERBOSE)
+	SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
+ QUERY PLAN 
+
+ Limit
+   Output: t1."C 1"
+   ->  Merge Right Join
+ Output: t1."C 1"
+ Merge Cond: (t3.c1 = t1."C 1")
+ ->  Foreign Scan
+   Output: t3.c1
+   Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+   Remote SQL: SELECT r3."C 1" FROM ("S 1"."T 1" r2 INNER JOIN "S 1"."T 1" r3 ON (TRUE)) WHERE ((r2."C 1" = r3."C 1")) ORDER BY r2."C 1" ASC NULLS LAST
+ ->  Index Only Scan using t1_pkey on "S 1"."T 1" t1
+   Output: t1."C 1"
+(11 rows)
+
+SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
+ C 1 
+-
+ 101
+ 102
+ 103
+ 104
+ 105
+ 106
+ 107
+ 108
+ 109
+ 110
+(10 rows)
+
 RESET enable_hashjoin;
 RESET enable_nestloop;
 -- ===
 -- WHERE with remotely-executable conditions
 -- ===
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
  QUERY PLAN  
 -
  Foreign Scan on public.ft1 t1
Output: c1, c2, c3, c4, c5, c6, c7, c8
@@ -862,32 +896,29 @@ SELECT count(c3) FROM ft1 t1 WHERE t1.c1 === t1.c2;
 -- ===
 -- JOIN queries
 -- ===
 -- Analyze ft4 and ft5 so that we have better statistics. These tables do not
 -- have use_remote_estimate set.
 ANALYZE ft4;
 ANALYZE ft5;
 -- join two tables
 EXPLAIN (COSTS false, VERBOSE)
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
-

Re: [HACKERS] Pushing down sorted joins

2016-03-09 Thread Ashutosh Bapat

On Wed, Mar 9, 2016 at 9:22 PM, Robert Haas  wrote:

> On Wed, Mar 9, 2016 at 2:23 AM, Ashutosh Bapat
>  wrote:
> > [ new patch ]
>
> This looks OK to me.  Committed!
>
>
Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Adjusting the API of pull_var_clause()

2016-03-10 Thread Ashutosh Bapat

Now, I'm pretty sure that the last time we touched pull_var_clause's
> API, we intentionally set it up to force every call site to be visited
> when new behaviors were added.  But right at the moment that's looking
> like it was a bad call.
>
> An alternative API design could look like
>
> #define PVC_INCLUDE_AGGREGATES   0x0001 /* include Aggrefs in output list
> */
> #define PVC_RECURSE_AGGREGATES   0x0002 /* recurse into Aggref arguments */
> #define PVC_INCLUDE_PLACEHOLDERS 0x0004 /* include PlaceHolderVars in
> output list */
> #define PVC_RECURSE_PLACEHOLDERS 0x0008 /* recurse into PlaceHolderVar
> arguments */
>
>
extern List *pull_var_clause(Node *node, int flags);
>
> with calls along the line of
>
> pull_var_clause(node, PVC_INCLUDE_AGGREGATES | PVC_RECURSE_PLACEHOLDERS);
>
> the default behavior if you specify no flag being "error if node type
> is seen".
>
> The attraction of this approach is that if we add another behavior
> to pull_var_clause, while we'd still likely need to run around and
> check every call site, it wouldn't be positively guaranteed that
> we'd need to edit every darn one of them.
>

That can be a problem for extension or any code outside the PG repository
that uses pull_var_clause(). Right now they would notice it because
compilation will fail. Those extensions wouldn't know that a new option has
been added and will be presented the result with default option. That may
not be bad for window nodes but I am sure that, that would be the case for
other nodes.

The name of the function suggests that should get all the Var nodes from a
given expression. Throwing error when an unexpected node is encountered,
seems to be a side effect. So RECURSE should be the default behaviour and
not REJECT.

I am not sure whether REJECT behaviour is something of a sanity check and
not the real thing. How many calls which specify REJECT_AGGREGATE, really
expect an aggregate to be in the expression passed. Probably not. If they
really case, shouldn't there be a separate API for checking just that. In
fact, we may want to change the  API to indicate where to stop recursing
e.g. at aggregate nodes or placeholder nodes or window nodes. Since we are
thinking of changing the API, we may consider this change as well.
Although, I think this would have been OK when pull_var_clause was being
written afresh. Now, that we have this API, I am not sure whether the
effort is worth the result.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Obsolete comment in postgres_fdw.c

2016-03-14 Thread Ashutosh Bapat

On Mon, Mar 14, 2016 at 9:05 AM, Etsuro Fujita 
wrote:

> Hi,
>
> Here is the comments for foreign_join_ok in postgres_fdw.c:
>
> /*
>  * Assess whether the join between inner and outer relations can be
> pushed down
>  * to the foreign server. As a side effect, save information we obtain
> in this
>  * function to PgFdwRelationInfo passed in.
>  *
>  * Joins that satisfy conditions below are safe to push down.
>  *
>  * 1) Join type is INNER or OUTER (one of LEFT/RIGHT/FULL)
>  * 2) Both outer and inner portions are safe to push-down
>  * 3) All foreign tables in the join belong to the same foreign server
> and use
>  *the same user mapping.
>  * 4) All join conditions are safe to push down
>  * 5) No relation has local filter (this can be relaxed for INNER JOIN,
> if we
>  *can move unpushable clauses upwards in the join tree).
>  */
>
>
The condition 3 is now checked by the core, so I'd like to remove that
> condition from the above comments.
>

It was left there intentionally to document all the conditions in one place
(some from the core and some from the FDW itself), for a ready reference.
In case tomorrow core thinks that matching user mapping is not required,
postgres_fdw would still require it to be incorporated.


>
> In addition, I'd like to update some related comments in
> src/include/nodes/relation.h and src/backend/optimizer/path/joinpath.c.
>

Those look fine. Sorry for missing those in the commit and thanks for
providing a patch for the same.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-03-14 Thread Ashutosh Bapat

On Mon, Mar 14, 2016 at 1:29 PM, Etsuro Fujita 
wrote:

> Hi,
>
> On 2016/02/09 14:09, Ashutosh Bapat wrote:
>
>> Sorry, I was wrong. For public user mapping userid is 0 (InvalidOid),
>> which is returned as is in UserMapping object. I confused InvalidOid
>> with -1.
>>
>
> I think the following umid handling in postgresGetForeignPlan has the same
> issue:
>
> /*
>  * Build the fdw_private list that will be available to the executor.
>  * Items in the list must match order in enum FdwScanPrivateIndex.
>  */
> fdw_private = list_make4(makeString(sql.data),
>  retrieved_attrs,
>  makeInteger(fpinfo->fetch_size),
>  makeInteger(foreignrel->umid));
>
> I don't think it's correct to use makeInteger for the foreignrel's umid.
>

As long as we are using makeInteger() and inVal() pair to set and extract
the values, it should be fine.

>
> You store the umid in the fdw_private list here and extract it from the
> list in postgresBeginForeignScan, to get the user mapping.  But we really
> need that?  We have a validated plan when getting called from
> postgresBeginForeignScan, so if foreign join, we can simply pick any of the
> plan's fs_relids and use it to identify which user to do the remote access
> as, in the same way as for foreign tables.
>

We have done that calculation ones while creating the plan, why do we want
to do that again? For a base relation, the user mapping needs to be found
out at the time of execution, since it could change between plan creation
and execution. But for a join plan invalidation takes care of this change.


> Attached is a patch for that.
>
> Best regards,
> Etsuro Fujita
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-14 Thread Ashutosh Bapat

On Mon, Mar 14, 2016 at 8:21 AM, Tom Lane  wrote:

> Etsuro Fujita  writes:
> > On 2016/03/13 4:46, Andres Freund wrote:
> >> ... The difference apears to be the
> >> check that's now in build_simple_rel() - there was nothing hitting the
> >> user mapping code before for file_fdw.
>
> > Exactly.
>
> > I'm not sure it's worth complicating the code to keep that behavior, so
> > I'd vote for adding the change notice to 9.6 release notes or something
> > like that in addition to updating file-fdw.sgml.
>
> Perhaps it would be useful for an FDW to be able to specify that user
> mappings are meaningless to it?  And then bypass this check for such FDWs?
>

In such a case, whether FDWs be given chance to push down joins? I guess
the answer is yes, but wanted to confirm.


>
> I'm not really sold on enforcing that people create meaningless user
> mappings.  For one thing, they're likely to be sloppy about it, which
> could lead to latent security problems if the FDW later acquires a
> concept that user mappings mean something.
>
>
Hmm. Should we let FDW handler set a boolean in fdwroutine specifying
whether the core code should bother about user mapping or not. This way the
author of FDW decides whether s/he is going to write code that uses user
mappings or not. A small problem with that is PG documentation describes
fdwroutine as a structure holding function pointers and now it will store a
boolean variable. But I think that can be managed either by having this
option as a function pointer returning boolean or changing the
documentation.

Other problem is what should we do when a user creates or has an existing
user mapping for an FDW which specifies that user mapping is meaningless to
it. Should we allow the user mapping to be created but ignore it or do not
allow it to be created? In the later case, what should happen to the
existing user mappings?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-15 Thread Ashutosh Bapat

Here's patch which fixes the issue using Robert's idea.

On Mon, Mar 14, 2016 at 10:53 PM, Robert Haas  wrote:

> On Mon, Mar 14, 2016 at 1:05 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Sun, Mar 13, 2016 at 10:51 PM, Tom Lane  wrote:
> >>> I'm not really sold on enforcing that people create meaningless user
> >>> mappings.  For one thing, they're likely to be sloppy about it, which
> >>> could lead to latent security problems if the FDW later acquires a
> >>> concept that user mappings mean something.
> >
> >> I think we should just fix build_simple_rel() so that it doesn't fail
> >> if there is no user mapping.  It can just set rel->umid to InvalidOid
> >> in that case, and if necessary we can adjust the code elsewhere to
> >> tolerate that.  This wasn't an intentional behavior change, and I
> >> think we should just put things back to the way they were.
> >
> > So, allow rel->umid to be InvalidOid if there's no user mapping, and
> > when dealing with a join, allow combining when both sides have
> InvalidOid?
>
> Exactly.  And we should make sure (possibly with a regression test)
> that postgres_fdw handles that case correctly - i.e. with the right
> error.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 416753d..35db4af 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -166,20 +166,23 @@ DROP TABLE agg;
 SET ROLE file_fdw_superuser;
 SELECT * FROM agg_text ORDER BY a;
 SET ROLE file_fdw_user;
 SELECT * FROM agg_text ORDER BY a;
 SET ROLE no_priv_user;
 SELECT * FROM agg_text ORDER BY a;   -- ERROR
 SET ROLE file_fdw_user;
 \t on
 EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_text WHERE a > 0;
 \t off
+-- file FDW allows foreign tables to be accessed without user mapping
+DROP USER MAPPING FOR file_fdw_user SERVER file_server;
+SELECT * FROM agg_text ORDER BY a;
 
 -- privilege tests for object
 SET ROLE file_fdw_superuser;
 ALTER FOREIGN TABLE agg_text OWNER TO file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 SET ROLE file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 SET ROLE file_fdw_superuser;
 
 -- cleanup
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 8719694..26f4999 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -315,31 +315,41 @@ SELECT * FROM agg_text ORDER BY a;   -- ERROR
 ERROR:  permission denied for relation agg_text
 SET ROLE file_fdw_user;
 \t on
 EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_text WHERE a > 0;
  Foreign Scan on public.agg_text
Output: a, b
Filter: (agg_text.a > 0)
Foreign File: @abs_srcdir@/data/agg.data
 
 \t off
+-- file FDW allows foreign tables to be accessed without user mapping
+DROP USER MAPPING FOR file_fdw_user SERVER file_server;
+SELECT * FROM agg_text ORDER BY a;
+  a  |b
+-+-
+   0 | 0.09561
+  42 |  324.78
+  56 | 7.8
+ 100 |  99.097
+(4 rows)
+
 -- privilege tests for object
 SET ROLE file_fdw_superuser;
 ALTER FOREIGN TABLE agg_text OWNER TO file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 SET ROLE file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 ERROR:  only superuser can change options of a file_fdw foreign table
 SET ROLE file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 8 other objects
+NOTICE:  drop cascades to 7 other objects
 DETAIL:  drop cascades to server file_server
-drop cascades to user mapping for file_fdw_user on server file_server
 drop cascades to user mapping for file_fdw_superuser on server file_server
 drop cascades to user mapping for no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
 drop cascades to foreign table text_csv
 DROP ROLE file_fdw_superuser, file_fdw_user, no_priv_user;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 48bdbef..48a7afa 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1699,27 +1699,44 @@ EXECUTE join_stmt;
  30 | 30
  32 |   
  34

Re: [HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-16 Thread Ashutosh Bapat

On Wed, Mar 16, 2016 at 2:14 AM, Robert Haas  wrote:

> On Tue, Mar 15, 2016 at 6:44 AM, Ashutosh Bapat
>  wrote:
> > Here's patch which fixes the issue using Robert's idea.
>
> Please at least check your patches with 'git diff --check'


Thanks.


> before
> submitting them.  And where it's not too much trouble, pgindent them.
> Or at least make them look something like what pgindent would have
> produced, instead of having the line lengths be all over the place.
>

Sorry. PFA the patch with git diff --check output fixed.


>
>  -- change the session user to view_owner and execute the statement.
> Because of
>  -- change in session user, the plan should get invalidated and created
> again.
> --- While creating the plan, it should throw error since there is no
> user mapping
> --- available for view_owner.
> +-- The join will not be pushed down since the joining relations do not
> have a
> +-- valid user mapping.
>
> Now what's going on here?  It seems to me that either postgres_fdw
> requires a user mapping (in which case this ought to fail) or it
> doesn't (in which case this ought to push the join down).  I don't
> understand how working but not pushing the join down can ever be the
> right behavior.
>
>
In 9.5, postgres_fdw allowed to prepare statements involving foreign tables
without an associated user mapping as long as planning did not require the
user mapping. Remember, planner would require user mapping in case
use_remote_estimate is ON for that foreign table. The user mapping would be
certainly required at the time of execution. So executing such a prepared
statement would throw error. If somebody created a user mapping between
prepare and execute, execute would not throw an error. A join can be pushed
down only when user mappings associated with the joining relations are
known and they match. But given the behavior of 9.5 we should let the
prepare succeed, even if the join can not be pushed down because of unknown
user mapping. Before this fix, the test was not letting the prepare
succeed, which is not compliant with 9.5.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 416753d..35db4af 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -166,20 +166,23 @@ DROP TABLE agg;
 SET ROLE file_fdw_superuser;
 SELECT * FROM agg_text ORDER BY a;
 SET ROLE file_fdw_user;
 SELECT * FROM agg_text ORDER BY a;
 SET ROLE no_priv_user;
 SELECT * FROM agg_text ORDER BY a;   -- ERROR
 SET ROLE file_fdw_user;
 \t on
 EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_text WHERE a > 0;
 \t off
+-- file FDW allows foreign tables to be accessed without user mapping
+DROP USER MAPPING FOR file_fdw_user SERVER file_server;
+SELECT * FROM agg_text ORDER BY a;
 
 -- privilege tests for object
 SET ROLE file_fdw_superuser;
 ALTER FOREIGN TABLE agg_text OWNER TO file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 SET ROLE file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 SET ROLE file_fdw_superuser;
 
 -- cleanup
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 8719694..26f4999 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -315,31 +315,41 @@ SELECT * FROM agg_text ORDER BY a;   -- ERROR
 ERROR:  permission denied for relation agg_text
 SET ROLE file_fdw_user;
 \t on
 EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_text WHERE a > 0;
  Foreign Scan on public.agg_text
Output: a, b
Filter: (agg_text.a > 0)
Foreign File: @abs_srcdir@/data/agg.data
 
 \t off
+-- file FDW allows foreign tables to be accessed without user mapping
+DROP USER MAPPING FOR file_fdw_user SERVER file_server;
+SELECT * FROM agg_text ORDER BY a;
+  a  |b
+-+-
+   0 | 0.09561
+  42 |  324.78
+  56 | 7.8
+ 100 |  99.097
+(4 rows)
+
 -- privilege tests for object
 SET ROLE file_fdw_superuser;
 ALTER FOREIGN TABLE agg_text OWNER TO file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 SET ROLE file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 ERROR:  only superuser can change options of a file_fdw foreign table
 SET ROLE file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 8 other objects
+NOTICE:  drop cascades to 7 other objects
 DETAIL:  drop cascades to server file_server
-drop cascades to user mapping for file_fdw_user on server file_server
 drop cascades to user mapping for file_fdw_superuser on server file_server
 drop cascades to user mapping for no_priv_user on server file_server
 drop cascad

Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch

2016-03-18 Thread Ashutosh Bapat

(2) when any of
>> xmins, xmaxs, cmins, and cmaxs are requested, postgres_fdw gives up
>> pushing down foreign joins.  (We might be able to set appropriate
>> values
>> for them locally the same way as for tableoids, but I'm not sure it's
>> worth complicating the code.)  I think that would be probably OK,
>> because users wouldn't retrieve any such columns in practice.
>>
>
> In that patch you have set pushdown_safe to false for the base relation
>> fetching system columns. But pushdown_safe = false means that that
>> relation is not safe to push down. A base foreign relation is always
>> safe to push down, so should have pushdown_safe = true always. Instead,
>> I suggest having a separate boolean has_unshippable_syscols (or
>> something with similar name) in PgFdwRelationInfo, which is set to true
>> in such case. In foreign_join_ok, we return false (thus not pushing down
>> the join), if any of the joining relation has that attribute set. By
>> default this member is false.
>>
>
> Maybe I'm missing something, but why do you consider that base foreign
> tables need always be safe to push down?  IIUC, the pushdown_safe flag is
> used only for foreign_join_ok, so I think that a base foreign table needs
> not necessarily be safe to push down.
>
>
A base foreign table "always" fetches data from the foreign server, so it
"has to be" always safe to push down. pushdown_safe flag is designated to
tell whether the relation corresponding to PgFdwRelationInfo where this
flag is set is safe to push down.Right now it's only used for joins but in
future it would be used for any push down of higher operations. It seems
very much possible after the upper pathification changes. We can not have a
query sent to the foreign server for a relation, when pushdown_safe is
false for that. Your patch does that for foreign base relation which try to
fetch system columns.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch

2016-03-19 Thread Ashutosh Bapat

On Thu, Mar 17, 2016 at 4:30 PM, Etsuro Fujita 
wrote:

> Hi,
>
> I noticed that the postgres_fdw join pushdown patch retrieves system
> columns other than ctid (and oid) from the remote server as shown in the
> example:
>
> postgres=# explain verbose select foo.tableoid, foo.xmin, foo.cmin,
> foo.xmax, foo.cmax, foo.* from foo, bar where foo.a = bar.a;
>
>QUERY PLAN
>
>
> 
> 
>  Foreign Scan  (cost=100.00..102.09 rows=2 width=28)
>Output: foo.tableoid, foo.xmin, foo.cmin, foo.xmax, foo.cmax, foo.a,
> foo.b
>Relations: (public.foo) INNER JOIN (public.bar)
>Remote SQL: SELECT r1.tableoid, r1.xmin, r1.cmin, r1.xmax, r1.cmax,
> r1.a, r1.b FROM (public.foo r1 INNER JOIN public.bar r2 ON (TRUE)) WHERE
> ((r1.a =
>  r2.a))
> (4 rows)
>

Thanks for catching the bug and producing a patch.

>
> BUT: we don't make any effort to ensure that local and remote values
> match, so system columns other than ctid and oid should not be retrieved
> from the remote server.  So, I'd like to propose: (1) when tableoids are
> requested from the remote server, postgres_fdw sets valid values for
> them locally, instead (core should support that?) and

If we are disabling join pushdown when the targetlist has other system
columns, shouldn't we treat tableoid in the same fashion. We should disable
join pushdown when tableoid is requested?

I agree that we might want to do this in core instead of FDW specific core.
That way we avoid each FDW implementing its own solution. Ultimately, all
that needs to be done to push OID of the foreign table in place of tableoid
column. The core code can do that. It already does that for the base
tables.

> (2) when any of
> xmins, xmaxs, cmins, and cmaxs are requested, postgres_fdw gives up
> pushing down foreign joins.  (We might be able to set appropriate values
> for them locally the same way as for tableoids, but I'm not sure it's
> worth complicating the code.)  I think that would be probably OK,
> because users wouldn't retrieve any such columns in practice.
>

In that patch you have set pushdown_safe to false for the base relation
fetching system columns. But pushdown_safe = false means that that relation
is not safe to push down. A base foreign relation is always safe to push
down, so should have pushdown_safe = true always. Instead, I suggest having
a separate boolean has_unshippable_syscols (or something with similar name)
in PgFdwRelationInfo, which is set to true in such case. In
foreign_join_ok, we return false (thus not pushing down the join), if any
of the joining relation has that attribute set. By default this member is
false.

Even for a base table those values are rather random, although they are not
fetched from the foreign server. Instead of not pushing the join down, we
should push the join down without fetching those attributes. While
constructing the query, don't include these system attributes in SELECT
clause and don't include corresponding positions in retrieved_attributes
list. That means those attributes won't be set while fetching the row from
the foreign server and will have garbage values in corresponding places. I
guess that would work.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-19 Thread Ashutosh Bapat

On Wed, Mar 16, 2016 at 10:22 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Wed, Mar 16, 2016 at 4:10 AM, Ashutosh Bapat <
> > ashutosh.ba...@enterprisedb.com> wrote:
> >> In 9.5, postgres_fdw allowed to prepare statements involving foreign
> >> tables without an associated user mapping as long as planning did not
> >> require the user mapping. Remember, planner would require user mapping
> in
> >> case use_remote_estimate is ON for that foreign table. The user mapping
> >> would be certainly required at the time of execution. So executing such
> a
> >> prepared statement would throw error. If somebody created a user mapping
> >> between prepare and execute, execute would not throw an error. A join
> can
> >> be pushed down only when user mappings associated with the joining
> >> relations are known and they match. But given the behavior of 9.5 we
> should
> >> let the prepare succeed, even if the join can not be pushed down
> because of
> >> unknown user mapping. Before this fix, the test was not letting the
> prepare
> >> succeed, which is not compliant with 9.5.
>
> > If a query against a single table with no user mapping is legal, I don't
> > see why pushing down a join between two tables neither of which has a
> user
> > mapping shouldn't also be legal.
>
> The key point here is that Ashutosh is arguing on the basis of the
> behavior of postgres_fdw, which is not representative of all FDWs.
> The core code has no business assuming that all FDWs require user
> mappings; file_fdw is a counterexample.
>
>
Here's what the patch is doing.

The "core" code FDW is given chance to add paths for a join between foreign
relations if they are from the same (valid) server and have same user
mappings, even if the user mapping is invalid. This is code in
build_join_rel(). So, from core code's perspective it's perfectly fine to
push a join down when both sides do not have valid user mapping associated
with them. So, file_fdw is capable of implementing join pushdown even
without having user mapping.

postgres_fdw code however is different. Rest of the discussion only
pertains to postgres_fdw. The comment in postgres_fdw.sql testcase, to
which Robert objected, is relevant only for postgres_fdw.

postgres_fdw requires user mapping to execute the query. For base foreign
tables, it's fine not to have a user mapping at the time of planning as
long as planner doesn't need to query the foreign server. But it certainly
requires a user mapping at the time of execution, or else it throws error
at the time of execution. So, Robert's statement "If a query against a
single table with no user mapping is legal" is not entirely correct in the
context of postgres_fdw; postgresGetForeignRelSize(), which is called
during planning, can throw error if it doesn't find a valid user mapping.
If a join between two postgres_fdw foreign relations without valid user
mappings is decided to be pushed down at the time of planning, it will
require a user mapping at the time of execution. This user mapping is
derived from the joining relations recursively. If all of them have same
user mapping (even if invalid) it's fine. If it's invalid, we will throw
error at the time of execution. But if they acquire different user
mappings, postgres_fdw can not fire the join query. It can not use any of
the user mappings since that will compromise security. So, we have landed
with a plan which can not be executed. To be on the safer side,
postgres_fdw does not push a join down if joining sides do not have a valid
user mapping. A base foreign relation with postgres_fdw will require a user
mapping, for all serious uses. So, it's not likely that we will end up with
joins between postgres_fdw foreign relations with invalid user mapping.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Postgres_fdw join pushdown - getting server crash in left outer join of three table

2016-03-21 Thread Ashutosh Bapat

Thanks Michael for looking into this.



> In get_useful_ecs_for_relation, it seems to me that this assertion
> should be removed and replaces by an actual check because even if
> right_ec and left_ec are initialized, we cannot be sure that ec_relids
> contains the relations specified:
> /*
>  * restrictinfo->mergeopfamilies != NIL is sufficient to guarantee
>  * that left_ec and right_ec will be initialized, per comments in
>  * distribute_qual_to_rels, and rel->joininfo should only contain
> ECs
>  * where this relation appears on one side or the other.
>  */
> if (bms_is_subset(relids, restrictinfo->right_ec->ec_relids))
> useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
>
>  restrictinfo->right_ec);
> else
> {
> Assert(bms_is_subset(relids,
> restrictinfo->left_ec->ec_relids));
> useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
>
> restrictinfo->left_ec);
> }
>

An EC covers all the relations covered by all the equivalence members it
contains. In case of mergejoinable clause for outer join, EC may cover just
a single relation whose column appears on either side of the clause. In
this case, bms_is_subset() for a given join relation covering single
relation in EC will be false. So, we have to use bms_overlap() instead of
bms_is_subset(). The caller get_useful_pathkeys_for_rel() extracts the
equivalence member (if any), which is entirely covered by the given
relation. Otherwise, you are correct that we have to convert the assertion
into a condition. I have added comments in get_useful_ecs_for_relation()
explaining, why.

See for example the attached (with more tests including combinations
> of joins, and three-table joins). I have added an open item for 9.6 on
> the wiki.
>

Thanks for those tests. Actually, that code is relevant for joins which can
not be pushed down to the foreign server. For such joins we try to add
pathkeys which will help merge joins. I have included the relevant tests
rewriting them to use local tables, so that the entire join is not pushed
down to the foreign server.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index a7f32f3..d38ff86 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -471,20 +471,88 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.
  103
  104
  105
  106
  107
  108
  109
  110
 (10 rows)
 
+-- Test similar to above, except that the full join prevents any equivalence
+-- classes from being merged. This produces single relation equivalence classes
+-- included in join restrictions.
+EXPLAIN (COSTS false, VERBOSE)
+	SELECT t1."C 1", t2.c1, t3.c1 FROM "S 1"."T 1" t1 left join ft1 t2 full join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
+QUERY PLAN
+--
+ Limit
+   Output: t1."C 1", t2.c1, t3.c1
+   ->  Merge Right Join
+ Output: t1."C 1", t2.c1, t3.c1
+ Merge Cond: (t3.c1 = t1."C 1")
+ ->  Foreign Scan
+   Output: t3.c1, t2.c1
+   Relations: (public.ft2 t3) LEFT JOIN (public.ft1 t2)
+   Remote SQL: SELECT r3."C 1", r2."C 1" FROM ("S 1"."T 1" r3 LEFT JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r3."C 1" ORDER BY r3."C 1" ASC NULLS LAST
+ ->  Index Only Scan using t1_pkey on "S 1"."T 1" t1
+   Output: t1."C 1"
+(11 rows)
+
+SELECT t1."C 1", t2.c1, t3.c1 FROM "S 1"."T 1" t1 left join ft1 t2 full join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
+ C 1 | c1  | c1  
+-+-+-
+ 101 | 101 | 101
+ 102 | 102 | 102
+ 103 | 103 | 103
+ 104 | 104 | 104
+ 105 | 105 | 105
+ 106 | 106 | 106
+ 107 | 107 | 107
+ 108 | 108 | 108
+ 109 | 109 | 109
+ 110 | 110 | 110
+(10 rows)
+
+-- Test similar to above with all full outer joins
+EXPLAIN (COSTS false, VERBOSE)
+	SELECT t1."C 1", t2.c1, t3.c1 FROM "S 1"."T 1" t1 full join ft1 t2 full join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
+

Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch

2016-03-21 Thread Ashutosh Bapat

On Tue, Mar 22, 2016 at 8:03 AM, Etsuro Fujita 
wrote:

> On 2016/03/19 4:51, Robert Haas wrote:
>
>> On Thu, Mar 17, 2016 at 7:00 AM, Etsuro Fujita
>>  wrote:
>>
>>> So, I'd like to propose: (1) when tableoids are
>>> requested from the remote server, postgres_fdw sets valid values for
>>> them locally, instead (core should support that?)
>>>
>>
> Sure.
>>
>
> and (2) when any of
>>> xmins, xmaxs, cmins, and cmaxs are requested, postgres_fdw gives up
>>> pushing down foreign joins.  (We might be able to set appropriate values
>>> for them locally the same way as for tableoids, but I'm not sure it's
>>> worth complicating the code.)  I think that would be probably OK,
>>> because users wouldn't retrieve any such columns in practice.
>>>
>>
> Now that seems like the wrong reaction.  I mean, aren't these just
>> going to be 0 or something?  Refusing to push the join down seems
>> strange.
>>
>
> OK, I'll modify the patch so that the join is pushed down even if any of
> xmins, xmaxs, cmins, and cmaxs are requested.  Do you think which one
> should set values for these as well as tableoids, postgres_fdw or core?
>

Earlier in this mail chain, I suggested that the core should take care of
storing the values for these columns. But instead, I think, core should
provide functions which can be used by FDWs, if they want, to return values
for those columns. Something like Datum
get_syscol_value(RelOptInfo/Relation, attno). The function will return
Datum 0 for most of the columns and table's OID for tableoid. My 0.02.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch

2016-03-22 Thread Ashutosh Bapat

On Tue, Mar 22, 2016 at 5:05 PM, Etsuro Fujita 
wrote:

> On 2016/03/22 14:54, Ashutosh Bapat wrote:
>
>> On Tue, Mar 22, 2016 at 8:03 AM, Etsuro Fujita
>> mailto:fujita.ets...@lab.ntt.co.jp>> wrote:
>> OK, I'll modify the patch so that the join is pushed down even if
>> any of xmins, xmaxs, cmins, and cmaxs are requested.  Do you think
>> which one should set values for these as well as tableoids,
>> postgres_fdw or core?
>>
>
> Earlier in this mail chain, I suggested that the core should take care
>> of storing the values for these columns. But instead, I think, core
>> should provide functions which can be used by FDWs, if they want, to
>> return values for those columns. Something like Datum
>> get_syscol_value(RelOptInfo/Relation, attno). The function will return
>> Datum 0 for most of the columns and table's OID for tableoid. My 0.02.
>>
>
> What I had in mind was (1) create_foreignscan_plan would create Lists from
> the ForeignScan's fdw_scan_tlist: (a) indexes/OID values of tableoids in
> fdw_scan_tlist, and (b) indexes of xids and cids in fdw_scan_tlist, and
> then (2) ForeignNext would set the OID values for the tableoid columns in
> the scan tuple, using the Lists (a), and appropriate values (0 or
> something) for the xid and cid columns in the scan tuple, using the List
> (b).
>

Looks Ok to me, although, that way an FDW looses an ability to use its own
values for those columns, in case it wants to. For example, while using
postgres_fdw for sharding, it might help saving xmax, xmin, cmax, cmin from
the foreign server and use them while communicating with the foreign server.


>
> Best regards,
> Etsuro Fujita
>
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch

2016-03-22 Thread Ashutosh Bapat

On Wed, Mar 23, 2016 at 8:20 AM, Etsuro Fujita 
wrote:

> On 2016/03/22 21:10, Ashutosh Bapat wrote:
>
>> On Tue, Mar 22, 2016 at 5:05 PM, Etsuro Fujita
>> mailto:fujita.ets...@lab.ntt.co.jp>> wrote:
>> On 2016/03/22 14:54, Ashutosh Bapat wrote:
>> Earlier in this mail chain, I suggested that the core should
>> take care
>> of storing the values for these columns. But instead, I think,
>> core
>> should provide functions which can be used by FDWs, if they want,
>> to
>> return values for those columns. Something like Datum
>> get_syscol_value(RelOptInfo/Relation, attno). The function will
>> return
>> Datum 0 for most of the columns and table's OID for tableoid. My
>> 0.02.
>>
>
> What I had in mind was (1) create_foreignscan_plan would create
>> Lists from the ForeignScan's fdw_scan_tlist: (a) indexes/OID values
>> of tableoids in fdw_scan_tlist, and (b) indexes of xids and cids in
>> fdw_scan_tlist, and then (2) ForeignNext would set the OID values
>> for the tableoid columns in the scan tuple, using the Lists (a), and
>> appropriate values (0 or something) for the xid and cid columns in
>> the scan tuple, using the List (b).
>>
>
> Looks Ok to me, although, that way an FDW looses an ability to use its
>> own values for those columns, in case it wants to. For example, while
>> using postgres_fdw for sharding, it might help saving xmax, xmin, cmax,
>> cmin from the foreign server and use them while communicating with the
>> foreign server.
>>
>
> Yeah, it might be the case.
>
> On second thoughts, I changed my mind; I think it'd be better for the
> FDW's to set values for tableoids, xids, and cids in the scan tuple. The
> reason other than your suggestion is because expressions in fdw_scan_tlist
> that contain such columns are not necessarily simple Vars and because such
> expressions might be evaluated more efficiently by the FDW than core.  We
> assume in postgres_fdw that expressions in fdw_scan_tlist are always simple
> Vars, though.
>
> I'm not sure it's worth providing functions you suggested, because we
> can't assume that columns in the scan tuple are always simple Var columns,
> as I said above.
>
>
An FDW can choose not to use those functions, so I don't see a connection
between scan list having simple Vars and existence of those functions
(actually a single one). But having those function would minimize the code
that each FDW has to write, in case they want those functions. E.g. we have
to translate Var::varno to tableoid in case that's requested by pulling RTE
and then getting oid out from there. If that functionality is available in
the core, 1. the code is not duplicated 2. every FDW will get the same
tableoid. Similarly for the other columns.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Postgres_fdw join pushdown - getting server crash in left outer join of three table

2016-03-24 Thread Ashutosh Bapat

> > Thanks for the report and the testing.  I have committed the patch.
>
>
Thanks.


> Cool, I have refreshed the wiki page for open items accordingly.
>

Thanks.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch

2016-03-24 Thread Ashutosh Bapat

On Thu, Mar 24, 2016 at 9:31 AM, Etsuro Fujita 
wrote:

> On 2016/03/23 13:44, Ashutosh Bapat wrote:
>
>> An FDW can choose not to use those functions, so I don't see a
>> connection between scan list having simple Vars and existence of those
>> functions (actually a single one). But having those function would
>> minimize the code that each FDW has to write, in case they want those
>> functions. E.g. we have to translate Var::varno to tableoid in case
>> that's requested by pulling RTE and then getting oid out from there. If
>> that functionality is available in the core, 1. the code is not
>> duplicated 2. every FDW will get the same tableoid. Similarly for the
>> other columns.
>>
>
> OK.  Then, I'd like to propose a function that would create interger Lists
> of indexes of tableoids, xids and cids plus

Ok,

> an OID List of these tableoids,

I didn't get this.

> in a given fdw_scan_tlist, on the assumption that each expression in the
> fdw_scan_tlist is a simple Var.

I guess this is Ok. In fact, at least for now an expression involving any
of those columns is not pushable to the foreign server, as the expression
can not be evaluated there. So, if we come across such a case in further
pushdowns, we will need to have a different solution for pushing down such
target lists.

> I'd also like to propose another function that would fill system columns
> using these Lists when creating a scan tuple.
>
> Ok.

I had imagined that the code to extract the above lists and filling the
values in scan tuple will be in FDW. We only provide a function to supply
those values. But what you propose might actually be much practical.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch

2016-03-24 Thread Ashutosh Bapat

A much simpler solution, that will work with postgres_fdw, might be to just
deparse these columns with whatever random values (except for tableoid)
they are expected to have in those places. Often these values can simply be
NULL or 0. For tableoid deparse it to 'oid value'::oid. Thus for a user
query

select t1.taleoid, t2.xmax, t1.c1, t2.c2 from t1 join t2 on (...) ... --
where t1 and t2 are foreign tables with same names on the foreign server.

the query sent to the foreign server would look like

select '15623'::oid, NULL, t1.c1, t2.c2 from t1 join t2 on (...) ... --
where '15623' is oid of t1 on local server.

This does spend more bandwidth than necessary and affect performance, here
is why the approach might be better,
1. It's not very common to request these system columns in a "join" query
involving foreign tables. Usually they will have user columns or ctid
(DMLs) but very rarely other system columns.

2. This allows expressions involving these system columns to be pushed
down, whenever we will start pushing them down in the targetlist.

3. The changes to the code are rather small. deparseColumnRef() will need
to produce the strings above instead of actual column names.

4. The approach will work with slight change, if and when, we need the
actual system column values from the foreign server. That time the above
function needs to deparse the column names instead of constant values.

Having to hardcode tableoid at the time of planning should be fine since
change in tableoid between planning and execution will trigger plan cache
invalidation. I haven't tried this though.

Sorry for bringing this solution late to the table.

On Thu, Mar 24, 2016 at 3:04 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
>
> On Thu, Mar 24, 2016 at 9:31 AM, Etsuro Fujita <
> fujita.ets...@lab.ntt.co.jp> wrote:
>
>> On 2016/03/23 13:44, Ashutosh Bapat wrote:
>>
>>> An FDW can choose not to use those functions, so I don't see a
>>> connection between scan list having simple Vars and existence of those
>>> functions (actually a single one). But having those function would
>>> minimize the code that each FDW has to write, in case they want those
>>> functions. E.g. we have to translate Var::varno to tableoid in case
>>> that's requested by pulling RTE and then getting oid out from there. If
>>> that functionality is available in the core, 1. the code is not
>>> duplicated 2. every FDW will get the same tableoid. Similarly for the
>>> other columns.
>>>
>>
>> OK.  Then, I'd like to propose a function that would create interger
>> Lists of indexes of tableoids, xids and cids plus
>
>
> Ok,
>
>
>> an OID List of these tableoids,
>
>
> I didn't get this.
>
>
>> in a given fdw_scan_tlist, on the assumption that each expression in the
>> fdw_scan_tlist is a simple Var.
>
>
> I guess this is Ok. In fact, at least for now an expression involving any
> of those columns is not pushable to the foreign server, as the expression
> can not be evaluated there. So, if we come across such a case in further
> pushdowns, we will need to have a different solution for pushing down such
> target lists.
>
>
>> I'd also like to propose another function that would fill system columns
>> using these Lists when creating a scan tuple.
>>
>> Ok.
>
> I had imagined that the code to extract the above lists and filling the
> values in scan tuple will be in FDW. We only provide a function to supply
> those values. But what you propose might actually be much practical.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

2016-03-29 Thread Ashutosh Bapat

> Observation:_ Inner join and full outer join combination on a table
>>
>> generating wrong result.
>>
>> SELECT * FROM lt;
>>   c1
>> 
>>1
>>2
>> (2 rows)
>>
>> SELECT * FROM ft;
>>   c1
>> 
>>1
>>2
>> (2 rows)
>>
>> \d+ ft
>>   Foreign table "public.ft"
>>   Column |  Type   | Modifiers | FDW Options | Storage | Stats target |
>> Description
>>
>> +-+---+-+-+--+-
>>   c1 | integer |   | | plain   |  |
>> Server: link_server
>> FDW Options: (table_name 'lt')
>>
>> --inner join and full outer join on local tables
>> SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
>> FULL JOIN lt t3 ON (t2.c1 = t3.c1);
>>   c1 | c1 | c1
>> ++
>>1 |  1 |  1
>>2 |  2 |  2
>> (2 rows)
>>
>> --inner join and full outer join on corresponding foreign tables
>> SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
>> FULL JOIN ft t3 ON (t2.c1 = t3.c1);
>>   c1 | c1 | c1
>> ++
>>1 |  1 |  1
>>1 |  2 |
>>2 |  1 |
>>2 |  2 |  2
>> (4 rows)
>>
>
Thanks Rajkumar for the detailed report.


>
> I think the reason for that is in foreign_join_ok.  This in that function:
>
> wrongly pulls up remote_conds from joining relations in the FULL JOIN
> case.  I think we should not pull up such conditions in the FULL JOIN case.
>
>
Right. For a full outer join, since each joining relation acts as outer for
the other, we can not pull up the quals to either join clauses or other
clauses. So, in such a case, we will need to encapsulate the joining
relation with conditions into a subquery. Unfortunately, the current
deparse logic does not handle this encapsulation. Adding that functionality
so close to the feature freeze might be risky given the amount of code
changes required.

PFA patch with a quick fix. A full outer join with either of the joining
relations having WHERE conditions (or other clauses) is not pushed down. In
the particular case that was reported, the bug triggered because of the way
conditions are handled for an inner join. For an inner join, all the
conditions in ON as well as WHERE clause are treated like they are part of
WHERE clause. This allows pushing down a join even if there are unpushable
join clauses. But the pushable conditions can be put back into the ON
clause. This avoids using subqueries while deparsing.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 50f1261..5c4ebb6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -441,31 +441,31 @@ SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1"
  107 | 107
  108 | 108
  109 | 109
  110 | 110
 (10 rows)
 
 -- A join between local table and foreign join. ORDER BY clause is added to the
 -- foreign join so that the local table can be joined using merge join strategy.
 EXPLAIN (COSTS false, VERBOSE)
 	SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
- QUERY PLAN 
-
+   QUERY PLAN
+-
  Limit
Output: t1."C 1"
->  Merge Right Join
  Output: t1."C 1"
  Merge Cond: (t3.c1 = t1."C 1")
  ->  Foreign Scan
Output: t3.c1
Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-   Remote SQL: SELECT r3."C 1" FROM ("S 1"."T 1" r2 INNER JOIN "S 1"."T 1" r3 ON (TRUE)) WHERE ((r2."C 1" = r3."C 1")) ORDER BY r2."C 1" ASC NULLS LAST
+   Remote SQL: SELECT r3."C 1" FROM

Re: [HACKERS] FDW join pushdown and scanclauses

2016-01-13 Thread Ashutosh Bapat

On Thu, Jan 14, 2016 at 9:31 AM, Etsuro Fujita 
wrote:

> On 2016/01/08 22:05, Ashutosh Bapat wrote:
>
>> In add_paths_to_joinrel(), the FDW specific hook GetForeignJoinPaths()
>> is called. This hook if implemented should add ForeignPaths for pushed
>> down joins. create_plan_recurse() calls create_scan_plan() on seeing
>> these paths.
>>
>
> create_scan_plan() generates a list of clauses to be applied on scan
>> from rel->baserestrictinfo and parameterization clauses. This list is
>> passed to create_*scan_plan routine as scanclauses. This code is very
>> specific for single relations scans. Now that we are using
>> create_scan_plan() for creating plan for join relations, it needs some
>> changes so that quals relevant to a join can be passed to
>> create_foreignscan_plan().
>>
>
> Do we really need that?  The relevant join quals are passed to
> GetForeignJoinPaths as extra->restrictlist, so I think we can get those
> quals during GetForeignPlan, by looking at the selected ForeignPath that is
> passed to that function as a parameter.
>

Right! You are suggesting that an FDW should just ignore scan_clauses for
join relation and manage those by itself. That looks fine.


>
> A related problem is in
>> create_foreignscan_plan(), which sets ForeignScan::fsSystemCol if a
>> system column is being used in the targetlist or quals. Right now it
>> only checks rel->baserestrictinfo, which is NULL for a joinrel. Thus in
>> case a system column appears in the joinclauses it will not be considered.
>>
>
> IIUC, we assume that such system columns are assumed to be contained in
> fdw_scan_tlist in the joinrel case.
>

>From another mail thread [1] that you have started, I understood that
fsSystemCol should not be set for a join relation, so the bug doesn't exist.

[1] http://www.postgresql.org/message-id/559f9323.4080...@lab.ntt.co.jp

Thanks for your inputs.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Limit and inherited tables

2016-01-17 Thread Ashutosh Bapat

On Fri, Jan 15, 2016 at 9:47 PM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
> This example is lacking indexes on the child tables, which is
>> why the plan shown is about as good as you're going to get.
>> The contents of foo1 and foo2 have to be read in entirety in any
>> case, and sorting them separately is not a win compared to doing
>> a single sort.
>>
> It is true, but not in case of FDW connected to remote host.
> In this case sending large volumes of data through network will be very
> inefficient.
>
> There will be no problem if FDW can provide index scan - in this case
> MergeAppend will fetch only required number of records:
>
> postgres=# explain analyze select * from t order by u limit 1;
>   QUERY PLAN
>
> ---
>  Limit  (cost=300.17..300.23 rows=1 width=8) (actual time=4.588..4.588
> rows=1 loops=1)
>->  Merge Append  (cost=300.17..762.76 rows=7681 width=8) (actual
> time=4.586..4.586 rows=1 loops=1)
>  Sort Key: t.u
>  ->  Index Scan using t_pkey on t  (cost=0.12..8.14 rows=1
> width=8) (actual time=0.003..0.003 rows=0 loops=1)
>  ->  Foreign Scan on t_fdw1  (cost=100.00..193.92 rows=2560
> width=8) (actual time=1.532..1.532 rows=1 loops=1)
>  ->  Foreign Scan on t_fdw2  (cost=100.00..193.92 rows=2560
> width=8) (actual time=1.510..1.510 rows=1 loops=1)
>  ->  Foreign Scan on t_fdw3  (cost=100.00..193.92 rows=2560
> width=8) (actual time=1.535..1.535 rows=1 loops=1)
>
> But if sort is performed by non-indexed fields, then current behaviour
> will be inefficient and can be significantly improved by pushing limits to
> remote hosts.
>
>
Pushing ORDER BY clause to the foreign server is supported with commit
f18c944b6137329ac4a6b2dce5745c5dc21a8578. Check if that helps to get sorted
data from the foreign server. LIMIT pushdown is not supported yet, though.


>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-18 Thread Ashutosh Bapat

Hi All,
PFA patches for postgres_fdw join pushdown, taken care of all TODOs in my
last mail.

Here is the list of things that have been improved/added new as compared to
Hanada-san's previous patch at [1].

1. Condition handling for join
Patch in [1] allowed a foreign join to be pushed down if only all the
conditions were safe to push down to the foreign server. This patch
differentiates these conditions into 1. conditions to be applied while
joining (ON clause) 2. conditions to be applied after joining (WHERE
clause). For a join to be safe to pushdown, only conditions in 1 need to be
all safe to pushdown. The conditions in second category, which are not safe
to be pushed down can be applied locally. This allows more avenue for join
pushdown. For an INNER join all the conditions can be applied on the cross
product. Hence we can push down an INNER join even if one or more of the
conditions are not safe to be pushed down. This patch includes the
optimization as well.

2. Targetlist handling:
The columns required to evaluate the non-pushable conditions on a join
relation need to be fetched from the foreign server. In previous patch the
SELECT clauses were built from rel->reltargetlist, which doesn't contain
these columns. This patch includes those columns as well.

3. Column projection:
Earlier patch required another layer of SQL to project whole-row attribute
from a base relation. This patch takes care of that while constructing and
deparsing
targetlist. This reduces the complexity and length of the query to be sent
to the foreign server e.g.

With the projection in previous patch the query looked like
EXPLAIN (COSTS false, VERBOSE)
SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)
ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
  QUERY PLAN


... explain output clipped
   Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT
l.a7, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a10,
l.a12 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7
a16, c8 a17, ctid a7 FROM "S 1"."T 1") l) l (a1, a2, a3, a4) INNER JOIN
(SELECT ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17), r.a9
FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8
a17 FROM "S 1"."T 1") r) r (a1, a2) ON ((l.a3 = r.a2))

With this patch it looks like
EXPLAIN (COSTS false, VERBOSE)
SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)
ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
   QUERY
PLAN

... explain output clipped
   Remote SQL: SELECT l.a3, l.a4, l.a1, l.a2, r.a2 FROM (SELECT
"C 1", c3, ctid, ROW("C 1", c2, c3, c4, c5, c6, c7, c8) FROM "S 1"."T 1") l
(a1, a2, a3, a4) INNER JOIN (SELECT "C 1", ROW("C 1", c2, c3, c4, c5, c6,
c7, c8) FROM "S 1"."T 1") r (a1, a2) ON (TRUE) WHERE ((l.a1 = r.a1))
(9 rows)

4. Local cost estimation
Previous patch had a TODO left for estimating join cost locally, when
use_remote_estimate is false. This patch adds support for the same. The
relevant
discussion in mail thread [2], [3].

5. This patch adds a GUC enable_foreignjoin to enable or disable join
pushdown through core.

6. Added more tests to test lateral references, unsafe to push conditions
at various places in the query,

Many cosmetic improvements like adding static function declarations,
comment improvements and making code readable.

[1]
http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_wagh2rgzpg0o4pqgd+iauyaj8wtze+cyj...@mail.gmail.com
[2]
http://www.postgresql.org/message-id/cafjfprcqswus+tb5iyp1m3c-w0k3xab6h5mw4+n2q2iuafs...@mail.gmail.com
[3]
http://www.postgresql.org/message-id/CAFjFpRepSC2e3mZ1uYSopJD6R19fOZ0dNNf9Z=gnyksb6wg...@mail.gmail.com

I will be working next on (in that order)
1. eval_plan_qual fix for foreign join. (Considered as a must-have for 9.6)
2. Pushing down ORDER BY clause along with join pushdown
3. Parameterization of foreign join paths (Given the complexity of the
feature this may not make it into 9.6)

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-20 Thread Ashutosh Bapat

On Wed, Jan 20, 2016 at 4:58 AM, Robert Haas  wrote:

> On Mon, Jan 18, 2016 at 6:47 AM, Ashutosh Bapat
>  wrote:
> > Thanks Thom for bringing it to my notice quickly. Sorry for the same.
> >
> > Here are the patches.
> >
> > 1. pg_fdw_core_v2.patch: changes in core related to user mapping
> handling,
> > GUC
> >   enable_foreignjoin
>
> I tried to whittle this patch down to something that I'd be
> comfortable committing and ended up with nothing left.
>
> First, I removed the enable_foreignjoin GUC.  I still think an
> FDW-specific GUC is better, and I haven't heard anybody make a strong
> argument the other way. Your argument that this might be inconvenient
> if somebody is using a bunch of join-pushdown-enabled FDWs sounds like
> a strictly theoretical problem, considering how much difficulty we're
> having getting even one FDW to support join pushdown.  And if it does
> happen, the user can script it.  I'm willing to reconsider this point
> if there is a massive chorus of support for having this be a core
> option rather than an FDW option, but to me it seems that we've gone
> to a lot of trouble to make the system extensible and might as well
> get some benefit from it.
>

Ok. Removed.

>
> Second, I removed the documentation for GetForeignTable().  That
> function is already documented and doesn't need re-documenting.
>

Removed.

>
> Third, I removed GetUserMappingById().  As mentioned in the email to
> which I replied earlier, that doesn't actually produce the same result
> as the way we're doing it now, and might leave the user ID invalid.
>

The comment you quoted was my comment :). I never got a reply from
Hanada-san on that comment. A bit of investigation revealed this: A pushed
down foreign join which involves N foreign tables, might have different
effective userid for each of them.
userid = OidIsValid(rte->checkAsUser) ? rte->checkAsUser : GetUserId()
In such a case, AFAIU, the join will be pushed down only if none of those
have user mapping and there is public user mapping. Is that right? In such
a case, which userid should be stored in UserMapping structure?It might
look like setting GetUserId() as userid in UserMapping is wise, but not
really. All the foreign tables might have different effective userids, each
different from GetUserId() and what GetUserId() would return might have a
user mapping different from the effective userids. What userid should
UserMapping structure have in that case? I thought, that's why Hanada-san
used invalid userid there, so left as it is.

But that has an undesirable effect of setting it to invalid, even in case
of base relation or when all the joining relations have same effective
userid. We may cache "any" of the effective userids of joining relations if
the user mapping matches in build_join_rel() (we will need to add another
field userid in RelOptInfo along with umid). While creating the plan use
this userid to get user mapping. Does that sound good? This clubbed with
the plan cache invalidation should be full proof. We need a way to get user
mapping whether from umid (in which case public user mapping will have -1)
or serverid and some userid.

> Even if that were no issue, it doesn't seem to add anything.  The only
> caller of the new function is  postgresBeginForeignScan(), and that
> function already has a way of getting the user mapping.  The new way
> doesn't seem to be better or faster, so why bother changing it?
>

> At this point, I was down to just the changes to store the user
> mapping ID (umid) in the RelOptInfo, and to consider join pushdown
> only if the user mapping IDs match.  One observation I made is that if
> the code to initialize the FDW-related fields were lifted from
> get_relation_info() up to build_simple_rel(), we would not need to use
> planner_rt_fetch(), because the caller already has that information.
> That seems like it might be worth considering.  But then I realized a
> more fundamental problem: making the plan depend on the user ID is a
> problem, because the user ID can be changed, and the plan might be
> cached.  The same issue arises for RLS, but there is provision for
> that in RevalidateCachedQuery.  This patch makes no similar provision.
>

Thanks for the catch. Please see attached patch for a quick fix in
RevalidateCachedQuery(). Are you thinking on similar lines? However, I am
not sure of planUserId - that field actually puzzles me. It's set when the
first time we create a plan and it never changes then. This seems to be a
problem, even for RLS, in following scene

prepare statement using RLS feature
execute statement -- now planUserId is set to say user1
set session role user2;
execute statement - RevalidateCachedQuery() detects that

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-20 Thread Ashutosh Bapat

I missed the example plan cache revalidation patch in the previous mail.
Sorry. Here it is.

On Wed, Jan 20, 2016 at 7:20 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
>
> On Wed, Jan 20, 2016 at 4:58 AM, Robert Haas 
> wrote:
>
>> On Mon, Jan 18, 2016 at 6:47 AM, Ashutosh Bapat
>>  wrote:
>> > Thanks Thom for bringing it to my notice quickly. Sorry for the same.
>> >
>> > Here are the patches.
>> >
>> > 1. pg_fdw_core_v2.patch: changes in core related to user mapping
>> handling,
>> > GUC
>> >   enable_foreignjoin
>>
>> I tried to whittle this patch down to something that I'd be
>> comfortable committing and ended up with nothing left.
>>
>> First, I removed the enable_foreignjoin GUC.  I still think an
>> FDW-specific GUC is better, and I haven't heard anybody make a strong
>> argument the other way. Your argument that this might be inconvenient
>> if somebody is using a bunch of join-pushdown-enabled FDWs sounds like
>> a strictly theoretical problem, considering how much difficulty we're
>> having getting even one FDW to support join pushdown.  And if it does
>> happen, the user can script it.  I'm willing to reconsider this point
>> if there is a massive chorus of support for having this be a core
>> option rather than an FDW option, but to me it seems that we've gone
>> to a lot of trouble to make the system extensible and might as well
>> get some benefit from it.
>>
>
> Ok. Removed.
>
>
>>
>> Second, I removed the documentation for GetForeignTable().  That
>> function is already documented and doesn't need re-documenting.
>>
>
> Removed.
>
>
>>
>> Third, I removed GetUserMappingById().  As mentioned in the email to
>> which I replied earlier, that doesn't actually produce the same result
>> as the way we're doing it now, and might leave the user ID invalid.
>>
>
> The comment you quoted was my comment :). I never got a reply from
> Hanada-san on that comment. A bit of investigation revealed this: A pushed
> down foreign join which involves N foreign tables, might have different
> effective userid for each of them.
> userid = OidIsValid(rte->checkAsUser) ? rte->checkAsUser : GetUserId()
> In such a case, AFAIU, the join will be pushed down only if none of those
> have user mapping and there is public user mapping. Is that right? In such
> a case, which userid should be stored in UserMapping structure?It might
> look like setting GetUserId() as userid in UserMapping is wise, but not
> really. All the foreign tables might have different effective userids, each
> different from GetUserId() and what GetUserId() would return might have a
> user mapping different from the effective userids. What userid should
> UserMapping structure have in that case? I thought, that's why Hanada-san
> used invalid userid there, so left as it is.
>
> But that has an undesirable effect of setting it to invalid, even in case
> of base relation or when all the joining relations have same effective
> userid. We may cache "any" of the effective userids of joining relations if
> the user mapping matches in build_join_rel() (we will need to add another
> field userid in RelOptInfo along with umid). While creating the plan use
> this userid to get user mapping. Does that sound good? This clubbed with
> the plan cache invalidation should be full proof. We need a way to get user
> mapping whether from umid (in which case public user mapping will have -1)
> or serverid and some userid.
>
>
>> Even if that were no issue, it doesn't seem to add anything.  The only
>> caller of the new function is  postgresBeginForeignScan(), and that
>> function already has a way of getting the user mapping.  The new way
>> doesn't seem to be better or faster, so why bother changing it?
>>
>
>> At this point, I was down to just the changes to store the user
>> mapping ID (umid) in the RelOptInfo, and to consider join pushdown
>> only if the user mapping IDs match.  One observation I made is that if
>> the code to initialize the FDW-related fields were lifted from
>> get_relation_info() up to build_simple_rel(), we would not need to use
>> planner_rt_fetch(), because the caller already has that information.
>> That seems like it might be worth considering.  But then I realized a
>> more fundamental problem: making the plan depend on the user ID is a
>> problem, because the user ID can be changed, and the plan might be
>> cached.  The same issue arises for RLS, but there is provision for
>> that in

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-20 Thread Ashutosh Bapat

On Thu, Jan 21, 2016 at 2:07 AM, Robert Haas  wrote:

>
> Well, we kind of need to get it right, not just be guessing.
>
> It looks to me as though GetConnection() only uses the user ID as a
> cache lookup key.  What it's trying to do is ensure that if user A and
> user B need different user mapping options, we don't use the same
> connection for both.  So, actually, passing InvalidOid seems like it's
> not really a problem here.  It's arguably more correct than what we've
> been doing up until now, since it means we cache the connection under
> user OID whose options we used, rather than the user OID that asked
> about the options.
>
>
In that case, do we need GetUserMappingById changes in that patch and not
pull it out. If we are keeping those changes, I need some clarification
about your comment

Even if that were no issue, it doesn't seem to add anything.  The only
> caller of the new function is  postgresBeginForeignScan(), and that
> function already has a way of getting the user mapping.  The new way
> doesn't seem to be better or faster, so why bother changing it?
>

In pg_fdw_join_v2.patch, postgresBeginForeignScan() obtained user mapping
using GetUserMappingById() instead of the earlier way of fetching it by
userid and serverid. So even that change will remain, right?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-21 Thread Ashutosh Bapat

On Thu, Jan 21, 2016 at 3:03 AM, Robert Haas  wrote:

> On Mon, Jan 18, 2016 at 6:47 AM, Ashutosh Bapat
>  wrote:
> > 2. pg_fdw_join_v2.patch: postgres_fdw changes for supporting join
> pushdown
>
> The very first hunk of this patch contains annoying whitespace
> changes.  Even if the result is what pgindent is going to do anyway,
> such changes should be stripped out of patches for ease of review.  In
> this case, though, I'm pretty sure it isn't what pgindent is going to
> do, so it's just useless churn.  Please remove all such changes from
> the patch.
>

Done.

>
> find_var_pos() looks like it should just be inlined into its only caller.
>
> Node *node = (Node *) var;
> TargetListEntry *tle = tlist_member(node, context->outerlist);
> if (tle)
> {
> side = OUTER_ALIAS;
> pos = tle->resno;
> }
> else
> {
> side = INNER_ALIAS;
> tle = tlist_member(node, context->innertlist);
> pos = tle->resno;
> }
>
> Why are we calling the return value "pos" instead of "resno" as we
> typically do elsewhere?
>

I have rewritten deparseJoinVar as
/*
 * Deparse given Var required for a joinrel into buf.
 */
static void
deparseJoinVar(Var *node, deparse_expr_cxt *context)
{
char*side;
TargetEntry *tle;

/* Lookup outer side */
tle = tlist_member((Node *)node, context->outertlist);
if (tle)
side = OUTER_ALIAS;
else
{
/* Not found on outer side; lookup inner */
side = INNER_ALIAS;
tle = tlist_member((Node *)node, context->innertlist);
}

/* The input var should be either on left or right side */
Assert(tle && side);

appendStringInfo(context->buf, "%s.%s%d", side, COL_ALIAS_PREFIX,
tle->resno);
}

> get_jointype_name() would probably be better written as a switch.

Done.

> On
> the flip side, deparseColumnRef() would have a lot less code churn if
> it *weren't* written as a switch.
>

Done.

>
> What this patch does to the naming and calling conventions in
> deparse.c does not good.  Previously, we had deparseTargetList().
> Now, we sometimes use that and sometimes deparseAttrsUsed() for almost
> the same thing.

Previously deparseTargetList deparsed the SELECT or RETURNING clause by
including list of name of attributes provided by attrs_used. That's now
done by deparseAttrsUsed(). In current path deparseTargetList() deparses
the targetlist i.e. list of TargetEntry nodes (right now only Vars).
Although these two functions return comma separated string of column names,
their inputs are different. deparseAttrsUsed() can never be called for more
than one base relation. deparseTargetList() on the other hand can deparse a
targetlist with Var nodes from multiple relations. We need those two
functionalities separately. We might convert attrs_used into a list of
TargetEntry nodes using build_tlist_to_deparse() and use deparseTargetList
everywhere. A side effect of that would be separating retrieved_attrs
collection from deparsing code. I didn't do that change in this version to
avoid large code changes. But I am fine doing that, if that makes code
readable.

If we have to keep old deparseTargetList as is (and don't rename it as
deparseAttrsUsed), we can rename the new deparseTargetList as something
different may be deparseSelectList. I am fine with that too. But we need
the later functionality, whatever its name be.

> Previously, we had deparseColumnRef(); now we have
> both that and deparseJoinVar() doing very similar things.  But in each
> case, the function names are not parallel and the calling conventions
> are totally different.  Like here:
>
> +   if (context->foreignrel->reloptkind == RELOPT_JOINREL)
> +   deparseJoinVar(node, context);
> +   else
> +   deparseColumnRef(buf, node->varno,
> node->varattno, context->root);
>
> We pass the buf separately to deparseColumnRef(), but for some reason
> not to deparseJoinVar().I wonder if these functions need to be two
> separate things or if the work done by deparseJoinVar() should
> actually be part of deparseColumnRef().  But even if it needs to be
> separate, I wonder why we can't arrange things so that they get the
> same arguments, more or less.
>

deparseVar() is called for any Var node that's encountered. deparseJoinVar
is called to deparse a Var from join relation, which is supposed to output
INNER or OUTER var's alias as used in INNER or OUTER subqueries. It does
not output the real column names. deparseColumnRef() however is the same
old thing; it deparses column of given base relation. They are not similar
things.

deparseJoinVar gets its buf from context, which we do not pass to
deparseColu

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-21 Thread Ashutosh Bapat

> > In such a
> > case, which userid should be stored in UserMapping structure?It might
> look
> > like setting GetUserId() as userid in UserMapping is wise, but not
> really.
> > All the foreign tables might have different effective userids, each
> > different from GetUserId() and what GetUserId() would return might have a
> > user mapping different from the effective userids. What userid should
> > UserMapping structure have in that case? I thought, that's why Hanada-san
> > used invalid userid there, so left as it is.
>
> Well, we kind of need to get it right, not just be guessing.
>
> It looks to me as though GetConnection() only uses the user ID as a
> cache lookup key.  What it's trying to do is ensure that if user A and
> user B need different user mapping options, we don't use the same
> connection for both.  So, actually, passing InvalidOid seems like it's
> not really a problem here.  It's arguably more correct than what we've
> been doing up until now, since it means we cache the connection under
> user OID whose options we used, rather than the user OID that asked
> about the options.
>

That means that, right now, for two different local users which use public
user mapping we will be creating two different connections to the foreign
server with the same credentials. That doesn't look good. If we obtained
user mapping using user mapping oid (which will have invalid user id for
public user mapping) and used userid from that structure, we will get rid
of this problem.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

[HACKERS] Using user mapping OID as hash key for connection hash

2016-01-27 Thread Ashutosh Bapat

Hi All,
As discussed in postgres_fdw join pushdown thread [1], for two different
effective local users which use public user mapping we will be creating two
different connections to the foreign server with the same credentials.

Robert suggested [2] that we should use user mapping OID as the connection
cache key instead of current userid and serverid.

There are two patches attached here:
1. pg_fdw_concache.patch.short - shorter version of the fix. Right now
ForeignTable, ForeignServer have corresponding OIDs saved in these
structures. But UserMapping doesn't. Patch adds user mapping OID as a
member to this structure. This member is then used as key in
GetConnection().
2. pg_fdw_concache.patch.large - most of the callers of GetConnection() get
ForeignServer object just to pass it to GetConnection(). GetConnection can
obtain ForeignServer by using serverid from UserMapping and doesn't need
ForeignServer to be an argument. Larger version of patch has this change.

GetConnection has named the UserMapping argument as just "user", ideally it
should have been named user_mapping. But that seems to be too obvious to be
unintentional. So, I have left that change.

The patch has added userid and user mapping oid to a debug3 message in
GetConnection(). the message is displayed when a new connection to foreign
server is created. With only that change, if we run script multi_conn.sql
(attached) we see that it's creating two connections when same user mapping
is used. Also attached is the output of the same script run on my setup.
Since min_messages is set to DEBUG3 there are too many unrelated messages.
So, search for "new postgres_fdw connection .." for new connection messages.

I have included the changes to the DEBUG3 message in GetConnection(), since
it may be worth committing those changes. In case, reviewers/committers
disagree, those chagnes can be removed.

[1]
http://www.postgresql.org/message-id/CAFjFpRf-LiD5bai4D6cSUseJh=xxjqipo_vn8mtnzg16tmw...@mail.gmail.com
[2]
http://www.postgresql.org/message-id/ca+tgmoymmv_du-vppq1d7ufsjaopbq+lgpxtchnuqfobjg2...@mail.gmail.com
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pg_fdw_concache.patch.large
Description: Binary data

pg_fdw_concache.patch.short
Description: Binary data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-29 Thread Ashutosh Bapat

On Fri, Jan 29, 2016 at 9:51 AM, Robert Haas  wrote:

> On Thu, Jan 28, 2016 at 11:26 AM, Ashutosh Bapat
>  wrote:
> > 2. pg_fdw_join_v3.patch: changes to postgres_fdw - more description below
>
> This patch no longer quite applies because of conflicts with one of
> your other patches that I applied today (cf. commit
> fbe5a3fb73102c2cfec114a67943f4474383).
>
> And then I broke it some more by committing a patch to extract
> deparseLockingClause from postgresGetForeignPlan and move it to
> deparse.c, but that should pretty directly reduce the size of this
> patch.  I wonder if there are any other bits of refactoring of that
> sort that we can do in advance of landing the main patch, just to
> simplify review.  But I'm not sure there are: this patch removes very
> little existing code; it just adds a ton of new stuff.
>

PFA patch to move code to deparse SELECT statement into a function
deparseSelectStmtForRel(). This code is duplicated in
estimate_path_cost_size() and postgresGetForeignPlan(), so moving it into a
function avoids that duplication. As a side note, estimate_path_cost_size()
doesn't add FOR SHARE/UPDATE clause to the statement being EXPLAINed, even
if the actual statement during execution would have it. This difference
looks unintentional to me. This patch corrects it as well.
appendOrderByClause and appendWhereClause both create a context within
themselves and pass it to deparseExpr. This patch creates the context once
in deparseSelectStmtForRel() and then passes it to the other deparse
functions avoiding repeated context creation.

> copy_path_for_epq_recheck() and friends are really ugly.  Should we
> consider just adding copyObject() support for those node types
> instead?
>

the note in copyfuncs.c says
 * We also do not support copying Path trees, mainly
 * because the circular linkages between RelOptInfo and Path nodes can't
 * be handled easily in a simple depth-first traversal.

We may avoid that by just copying the pointer to RelOptInfo and not copy
the whole RelOptInfo. The other problem is paths in epq_paths will be
copied as many times as the number of 2-way joins pushed down. Let me give
it a try and produce patch with that.

I'm sure there's more -- this is a huge patch and I don't fully
> understand it yet -- but I'm out of energy for tonight.
>
>
Thanks a lot for your comments and moving this patch forward.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pg_fdw_deparse_select.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-29 Thread Ashutosh Bapat

On Fri, Jan 29, 2016 at 2:05 PM, Etsuro Fujita 
wrote:

> On 2016/01/29 1:26, Ashutosh Bapat wrote:
>
>> Here's an updated version of the previous patches, broken up like before
>>
>
> 2. pg_fdw_join_v3.patch: changes to postgres_fdw - more description below
>>
>
> Here is the summary of changes from the last set of patches
>>
>
> 2. Included fix for EvalPlanQual in postgres_fdw - an alternate local
>> path is obtained from the list of paths linked to the joinrel. Since
>> this is done before adding the ForeignPath, we should be a local path
>> available for given join.
>>
>
> I looked at that code in the patch (ie, postgresRecheckForeignScan and the
> helper function that creates a local join path for a given foreign join
> path.), briefly.  Maybe I'm missing something, but I think that is
> basically the same as the fix I proposed for addressing this issue, posted
> before [1], right?


Yes, although I have added functions to copy the paths, not consider
pathkeys and change the relevant members of the paths. Sorry  if I have
missed giving you due credits.


>   If so, my concern is, the helper function probably wouldn't extend to
> the parameterized-foreign-join-path cases, though that would work well for
> the unparameterized-foreign-join-path cases.  We don't support
> parameterized-foreign-join paths for 9.6?
>
>
If we do not find a local path with given parameterization, it means there
are other local parameterized paths which are superior to it. This possibly
indicates that there will be foreign join parameterised paths which are
superior to this parameterized path, so we basically do not create foreign
join path with that parameterization.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-02 Thread Ashutosh Bapat

On Tue, Feb 2, 2016 at 5:18 AM, Robert Haas  wrote:

> On Mon, Feb 1, 2016 at 8:27 AM, Ashutosh Bapat
>  wrote:
> > Here are patches rebased on recent commit
> > cc592c48c58d9c1920f8e2063756dcbcce79e4dd. Renamed original
> deparseSelectSql
> > as deparseSelectSqlForBaseRel and added deparseSelectSqlForJoinRel to
> > construct SELECT and FROM clauses for base and join relations.
> >
> > pg_fdw_core_v5.patch GetUserMappingById changes
> > pg_fdw_join_v5.patch: postgres_fdw changes for join pushdown including
> > suggestions as described below
> > pg_join_pd_v5.patch: combined patch for ease of testing.
> >
> > The patches also have following changes along with the changes described
> in
> > my last mail.
> > 1. Revised the way targetlists are handled. For a bare base relation the
> > SELECT clause is deparsed from fpinfo->attrs_used but for a base relation
> > which is part of join relation, the expected targetlist is passed down to
> > deparseSelectSqlForBaseRel(). This change removed 75 odd lines in
> > build_tlist_to_deparse() which were very similar to
> > deparseTargetListFromAttrsUsed() in the previous patch.
>
> Nice!
>
> > 2. Refactored postgresGetForeignJoinPaths to be more readable moving the
> > code to assess safety of join pushdown into a separate function.
>
> That looks good.  But maybe call the function foreign_join_ok() or
> something like that?  is_foreign_join() isn't terrible but it sounds a
> little odd to me.
>

I used name is_foreign_join(), which assesses push-down safety of a join,
to have similar naming convention with is_foreign_expr(), which checks
push-down safety of an expression. But foreign_join_ok() is fine too. Used
that.

>
> The path-copying stuff in get_path_for_epq_recheck() looks a lot
> better now, but you neglected to add a comment explaining why you did
> it that way (e.g. "Make a shallow copy of the join path, because the
> planner might free the original structure after a future add_path().
> We don't need to copy the substructure, though; that won't get freed."
>

I alluded to that in the second sentence of comment
3259  * Since we will need to replace any foreign paths for join with their
alternate
3260  * paths, we need make a copy of the local path chosen. Also, that
helps in case
3261  * the planner chooses to throw away the local path.

But that wasn't as clear as your wording. Rewrote the paragraph using your
wording.
3259  * Since we will need to replace any foreign paths for join with their
alternate
3260  * paths, we need make a copy of the local path chosen. Make a shallow
copy of
3261  * the join path, because the planner might free the original
structure after a
3262  * future add_path(). We don't need to copy the substructure, though;
that won't
3263  * get freed.

>  I would forget about setting merge_path->materialize_inner = false;
> that doesn't seem essential.

Done.

> Also, I would arrange things so that if
> you hit an unrecognized path type (like a custom join, or a gather)
> you skip that particular path instead of erroring out.

Ok. Done.

> I think this
> whole function should be moved to core,

I have moved the function to foreign.c where most of the FDW APIs are
located and declared it in fdwapi.h. Since the function deals with the
paths, I thought of adding it to some path related file, but since it's a
helper function that an FDW can use, I thought foreign.c would be better. I
have also added documentation in fdwhandler.sgml. I have renamed the
function as GetPathForEPQRecheck() in order to be consistent with other FDW
APIs. In the description I have just mentioned copy of a local path. I am
not sure whether we should say "shallow copy".

> and I think the argument
> should be a RelOptInfo * rather than a List *.
>

Done.

>
> + * We can't know VERBOSE option is specified or not, so always add
> shcema
>
> We can't know "whether" VERBOSE...
> shcema -> schema
>
>
Done.

> + * the join relaiton is already considered, so that we won't waste
> time in
>
> Typo.
>
>
Done.

> + * judging safety of join pushdow and adding the same paths again if
> found
>
> Typo.
>
> Done.

Sorry for those typos.

Attaching patches with reply to your next mail.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-03 Thread Ashutosh Bapat

There seems to be double successive assignment to fdw_private in recent
commit. Here's patch to remove the first one.

On Wed, Feb 3, 2016 at 7:40 PM, Robert Haas  wrote:

> On Tue, Feb 2, 2016 at 10:42 PM, Corey Huinker 
> wrote:
> >> I don't see how.  There really is no declaration in there for a
> >> variable called server.
> >
> > Absolutely correct. My only guess is that it was from failing to make
> clean
> > after a checkout/re-checkout. A good reason to have even boring
> regression
> > tests.
> >
> > Regression tests added.
>
> Committed.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c
index d5c0383..c95ac05 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1009,22 +1009,20 @@ postgresGetForeignPlan(PlannerInfo *root,
 * expressions to be sent as parameters.
 */
initStringInfo(&sql);
deparseSelectStmtForRel(&sql, root, baserel, remote_conds,

best_path->path.pathkeys, &retrieved_attrs,
¶ms_list);
/*
 * Build the fdw_private list that will be available to the executor.
 * Items in the list must match enum FdwScanPrivateIndex, above.
 */
-   fdw_private = list_make2(makeString(sql.data),
-retrieved_attrs);
fdw_private = list_make3(makeString(sql.data),
 retrieved_attrs,
 
makeInteger(fpinfo->fetch_size));
 
/*
 * Create the ForeignScan node from target list, filtering expressions,
 * remote parameter expressions, and FDW private information.
 *
 * Note that the remote parameter expressions are stored in the 
fdw_exprs
 * field of the finished plan node; we can't keep them in private state

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-04 Thread Ashutosh Bapat

> * Is it safe to replace outerjoinpath with its fdw_outerpath the following
> way?  I think that if the join relation represented by outerjoinpath has
> local conditions that can't be executed remotely, we have to keep
> outerjoinpath in the path tree; we will otherwise fail to execute the local
> conditions.  No?
>
> +   /*
> +* If either inner or outer path is a ForeignPath
> corresponding to
> +* a pushed down join, replace it with the
> fdw_outerpath, so that we
> +* maintain path for EPQ checks built entirely of
> local join
> +* strategies.
> +*/
> +   if (IsA(joinpath->outerjoinpath, ForeignPath))
> +   {
> +   ForeignPath *foreign_path;
> +   foreign_path = (ForeignPath
> *)joinpath->outerjoinpath;
> +   if (foreign_path->path.parent->reloptkind
> == RELOPT_JOINREL)
> +   joinpath->outerjoinpath =
> foreign_path->fdw_outerpath;
> +   }
>
>
all the conditions (local and remote) should be part of fdw_outerpath as
well, since that's the alternate local path, which should produce (when
converted to the plan) the same result as the foreign path. fdw_outerpath
should be a local path set when paths for outerjoinpath->parent was being
created. Am I missing something?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-04 Thread Ashutosh Bapat

>
>If so, my concern is, the helper function probably wouldn't
>> extend to the parameterized-foreign-join-path cases, though that
>> would work well for the unparameterized-foreign-join-path cases.  We
>> don't support parameterized-foreign-join paths for 9.6?
>>
>
> If we do not find a local path with given parameterization, it means
>> there are other local parameterized paths which are superior to it. This
>> possibly indicates that there will be foreign join parameterised paths
>> which are superior to this parameterized path, so we basically do not
>> create foreign join path with that parameterization.
>>
>
> The latest version of the postgres_fdw join pushdown patch will support
> only the unparameterized-path case, so we don't have to consider this, but
> why do you think the superiority of parameterizations is preserved between
> remote joining and local joining?
>

AFAIU, parameterization for local paths bubbles up from base relations. For
foreign relations, we calculate the cost of parameterization when
use_remote_estimate is ON, which means it's accurate. So, except that we
will get clause selectivity wrong (if foreign tables were analyzed
regularly even that won't be the case, I guess) resulting in some small
sway in the costs as compared to those of parameterized foreign join paths.
So, I am guessing that the local estimates for parameterized join paths
would be closer to parameterized foreign paths (if we were to produce
those). Hence my statement. There is always a possibility that those two
costs are way too different, hence I have used phrase "possibly" there. I
could be wrong.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-04 Thread Ashutosh Bapat

On Thu, Feb 4, 2016 at 3:28 PM, Etsuro Fujita 
wrote:

> On 2016/02/04 17:58, Etsuro Fujita wrote:
>
>> On 2016/02/04 8:00, Robert Haas wrote:
>>
>>> On Wed, Feb 3, 2016 at 5:56 PM, Robert Haas 
>>> wrote:
>>>
>>>> On Wed, Feb 3, 2016 at 12:08 PM, Ashutosh Bapat
>>>>  wrote:
>>>>
>>>>> PFA patches with naming conventions similar to previous ones.
>>>>> pg_fdw_core_v7.patch: core changes
>>>>> pg_fdw_join_v7.patch: postgres_fdw changes for join pushdown
>>>>> pg_join_pd_v7.patch: combined patch for ease of testing.
>>>>>
>>>>
> One more: I think the following in postgresGetForeignJoinPaths() is a good
> idea, but I think it's okay to just check whether root->rowMarks is
> non-NIL, because that since we have rowmarks for all base relations except
> the target, if we have root->parse->commandType==CMD_DELETE (or
> root->parse->commandType==CMD_UPDATE), then there would be at least one
> non-target base relation in the joinrel, which would have a rowmark.
>
>
Sorry, I am unable to understand it fully. But what you are suggesting that
if there are root->rowMarks, then we are sure that there is at least one
base relation apart from the target, which needs locking rows. Even if we
don't have one, still changes in a row of target relation after it was
scanned, can result in firing EPQ check, which would need the local plan to
be executed, thus even if root->rowMarks is NIL, EPQ check can fire and we
will need alternate local plan.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-04 Thread Ashutosh Bapat

On Thu, Feb 4, 2016 at 4:30 AM, Robert Haas  wrote:

> On Wed, Feb 3, 2016 at 5:56 PM, Robert Haas  wrote:
> > On Wed, Feb 3, 2016 at 12:08 PM, Ashutosh Bapat
> >  wrote:
> >> PFA patches with naming conventions similar to previous ones.
> >> pg_fdw_core_v7.patch: core changes
> >> pg_fdw_join_v7.patch: postgres_fdw changes for join pushdown
> >> pg_join_pd_v7.patch: combined patch for ease of testing.
> >
> > Hmm, I think that GetPathForEPQRecheck() is a pretty terrible name.
> > How about GetExistingJoinPath()?
>

GetExistingLocalJoinPath()? Used that.

>
> Oops.  Hit Send too soon.  Also, how about writing if
> (path->param_info != NULL) continue; instead of burying the core of
> the function in another level of indentation?

Hmm. Done.

> I think you should skip
> paths that aren't parallel_safe, too, and the documentation should be
> clear that this will find an unparameterized, parallel-safe joinpath
> if one exists.
>

A query with FOR UPDATE/SHARE will be considered parallel unsafe in
has_parallel_hazard_walker() and root->glob->parallelModeOK will be marked
false. This implies that none of the base relations and hence join
relations will be marked as consider_parallel. IIUC your logic, none of the
queries with FOR UPDATE/SHARE will get a local path which is marked
parallel_safe and thus join will not be pushed down. Why do you think we
need to skip paths that aren't parallel_safe? I have left aside this change
in the latest patches.

>
> +   ForeignPath *foreign_path;
> +   foreign_path = (ForeignPath
> *)joinpath->outerjoinpath;
>
> Maybe insert a blank line between here, and in the other, similar case.
>

Done.

Patches attached with the previous mail.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-05 Thread Ashutosh Bapat

> Btw, IIUC, I think the patch fails to adjust the targetlist of the top
> plan created that way, to output the fdw_scan_tlist, as discussed in [1]
> (ie, I think the attached patch is needed, which is created on top of your
> patch pg_fdw_join_v8.patch).
>
>
fdw_scan_tlist represents the output fetched from the foreign server and is
not necessarily the output of ForeignScan. ForeignScan node's output is
represented by tlist argument to.

1119 return make_foreignscan(tlist,
1120 local_exprs,
1121 scan_relid,
1122 params_list,
1123 fdw_private,
1124 fdw_scan_tlist,
1125 remote_exprs,
1126 outer_plan);

This tlist is built using build_path_tlist() for all join plans. IIUC, all
of them output the same targetlist. We don't need to make sure that
targetlist match as long as we are using the targetlist passed in by
create_scan_plan(). Do you have a counter example?
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-05 Thread Ashutosh Bapat

Would it be nuts to set fdw_scan_tlist in all cases?  Would the code
> come out simpler than what we have now?
>
>
PFA the patch that can be applied on v9 patches.

If there is a whole-row reference for base relation, instead of adding that
as an additional column deparseTargetList() creates a list of all the
attributes of that foreign table . The whole-row reference gets constructed
during projection. This saves some network bandwidth while transferring the
data from the foreign server. If we build the target list for base
relation, we should include Vars for all the columns (like
deparseTargetList). Thus we borrow some code from deparseTargetList to get
all the attributes of a relation. I included this logic in function
build_tlist_from_attrs_used(), which is being called by
build_tlist_to_deparse(). So, before calling deparseSelectStmtForRel() the
callers can just call build_tlist_to_deparse() and pass the targetlist to
deparseSelectStmtForRel() and use the same to be handed over to the core
code as fdw_scan_tlist. build_tlist_to_deparse() also constructs
retrieved_attrs list, so that doesn't need to be passed around in deparse
routines.

But we now have similar code in three places deparseTargetList(),
deparseAnalyzeSql() and build_tlist_from_attrs_used(). So, I re-wrote
deparseTargetList() (which is now used to deparse returning list) to call
build_tlist_from_attrs_used() followed by deparseExplicitTargetList(). The
later and its minion deparseVar requires a deparse_expr_cxt to be passed.
deparse_expr_cxt has a member to store RelOptInfo of the relation being
deparsed. The callers of deparseReturningList() do not have it and thus
deparse_expr_cxt required changes, which in turn required changes in other
places. All in all, a larger refactoring. It touches more places than
necessary for foreign join push down. So, I think, we should try that as a
separate refactoring work. We may just do the work described in the first
paragraph for join pushdown, but we will then be left with duplicate code,
which I don't think is worth the output.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index fb72f45..7a2a67b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -93,9 +93,10 @@ typedef struct foreign_loc_cxt
 typedef struct deparse_expr_cxt
 {
PlannerInfo *root;  /* global planner state */
-   RelOptInfo *foreignrel; /* the foreign relation we are planning 
for */
+   Relids  relids; /* Relids for which to deparse 
*/
StringInfo  buf;/* output buffer to append to */
List  **params_list;/* exprs that will become remote Params 
*/
+   boolis_joinrel; /* Are we deparsing for a join 
relation */
 } deparse_expr_cxt;
 
 #define REL_ALIAS_PREFIX   "r"
@@ -122,8 +123,7 @@ static void deparseTargetList(StringInfo buf,
  Bitmapset *attrs_used,
  List **retrieved_attrs,
  bool qualify_col);
-static void deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
- deparse_expr_cxt *context);
+static void deparseExplicitTargetList(List *tlist, deparse_expr_cxt *context);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
 Index rtindex, Relation rel,
 bool trig_after_row,
@@ -151,13 +151,15 @@ static void printRemoteParam(int paramindex, Oid 
paramtype, int32 paramtypmod,
 deparse_expr_cxt *context);
 static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
   deparse_expr_cxt *context);
-static void deparseSelectSql(List *tlist, List **retrieved_attrs,
-deparse_expr_cxt *context);
+static void deparseSelectSql(List *tlist, RelOptInfo *rel,
+deparse_expr_cxt 
*context);
 static void deparseLockingClause(deparse_expr_cxt *context);
 static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
 static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
RelOptInfo *joinrel, bool use_alias, 
List **params_list);
+static List *build_tlist_from_attrs_used(Index rtindex, Bitmapset *attrs_used,
+   PlannerInfo *root, List 
**retrieved_attrs);
 
 
 /*
@@ -715,26 +717,119 @@ deparse_type_name(Oid type_oid, int32 typemod)
 }
 
 /*
+ * Convert input bitmap representation of columns into targetlist of
+ * correspondi

Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Ashutosh Bapat

Sorry to come to this late.

The userid being printed is from UserMapping. The new API
GetUserMappingById() allows an FDW to get user mapping by its OID. This API
is intended to be used by FDWs to fetch the user mapping inferred by the
core for given join between foreign relations. In such user mapping object
, userid may be -1 for a public user mapping. I think using %u for -1 will
print it as largest integer. Would that create confusion for users?

On Mon, Feb 8, 2016 at 9:37 PM, Tom Lane  wrote:

> Etsuro Fujita  writes:
> > Here is a patch to use %u not %d to print umid and userid.
>
> Pushed, thanks.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Ashutosh Bapat

Sorry, I was wrong. For public user mapping userid is 0 (InvalidOid), which
is returned as is in UserMapping object. I confused InvalidOid with -1.
Sorry for the confusion.

On Tue, Feb 9, 2016 at 10:21 AM, Tom Lane  wrote:

> Ashutosh Bapat  writes:
> > Sorry to come to this late.
> > The userid being printed is from UserMapping. The new API
> > GetUserMappingById() allows an FDW to get user mapping by its OID. This
> API
> > is intended to be used by FDWs to fetch the user mapping inferred by the
> > core for given join between foreign relations. In such user mapping
> object
> > , userid may be -1 for a public user mapping.
>
> If that is actually how it works, it's broken and I'm going to insist
> on a redesign.  There is nothing anywhere that says that 0x
> is not a valid OID.
>
>     regards, tom lane
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-09 Thread Ashutosh Bapat

Yay, finally!

Thanks.

On Wed, Feb 10, 2016 at 12:46 AM, Robert Haas  wrote:

> On Tue, Feb 9, 2016 at 8:39 AM, Ashutosh Bapat
>  wrote:
> > Thanks Jeevan for your review and comments. PFA the patch which fixes
> those.
>
> Committed with a couple more small adjustments.
>
> Woohoo, finally!
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] [COMMITTERS] pgsql: postgres_fdw: Push down joins to remote servers.

2016-02-09 Thread Ashutosh Bapat

Sorry for the trouble. Thanks Robert for fixing it.

On Wed, Feb 10, 2016 at 2:12 AM, Robert Haas  wrote:

> On Tue, Feb 9, 2016 at 3:10 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> postgres_fdw: Push down joins to remote servers.
> >
> > The buildfarm is not very impressed with this patch.
>
> Well, I guess that's why we have regression tests.  It looks like that
> test case introduces a dependency on particular CTID values that
> wasn't there before.  That doesn't seem like a very good idea; I'll
> remove that test.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-10 Thread Ashutosh Bapat

Here's patch to remove this declaration. The Assert next probably prevents
the warning for build with asserts. But both those lines are not needed.

On Wed, Feb 10, 2016 at 12:01 PM, Jeff Janes  wrote:

> On Tue, Feb 9, 2016 at 11:16 AM, Robert Haas 
> wrote:
> > On Tue, Feb 9, 2016 at 8:39 AM, Ashutosh Bapat
> >  wrote:
> >> Thanks Jeevan for your review and comments. PFA the patch which fixes
> those.
> >
> > Committed with a couple more small adjustments.
>
> I'm getting a compiler warning which I think is coming from this commit.
>
> postgres_fdw.c: In function 'fetch_more_data':
> postgres_fdw.c:2526:17: warning: unused variable 'fsplan'
> [-Wunused-variable]
> ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
>
> Thanks,
>
> Jeff
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


extra_fsplan.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-15 Thread Ashutosh Bapat

Thanks Fujita-san for bug report and the fix. Sorry for bug.

Here's patch with better way to fix it. I think while concatenating the
lists, we need to copy the lists being appended and in all the cases. If we
don't copy, a change in those lists can cause changes in the upward
linkages and thus lists of any higher level joins.

On Mon, Feb 15, 2016 at 1:10 PM, Etsuro Fujita 
wrote:

> On 2016/02/10 4:16, Robert Haas wrote:
>
>> On Tue, Feb 9, 2016 at 8:39 AM, Ashutosh Bapat
>>  wrote:
>>
>>> Thanks Jeevan for your review and comments. PFA the patch which fixes
>>> those.
>>>
>>
> Committed with a couple more small adjustments.
>>
>
> Thanks for working on this, Robert, Ashutosh, and everyone involved!
>
> I happened to notice that this code in foreign_join_ok():
>
> switch (jointype)
> {
> case JOIN_INNER:
> fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
>fpinfo_i->remote_conds);
> fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
>fpinfo_o->remote_conds);
> break;
>
> case JOIN_LEFT:
> fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
>   fpinfo_i->remote_conds);
> fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
>fpinfo_o->remote_conds);
> break;
>
> case JOIN_RIGHT:
> fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
>   fpinfo_o->remote_conds);
> fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
>fpinfo_i->remote_conds);
> break;
>
> case JOIN_FULL:
> fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
>   fpinfo_i->remote_conds);
> fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
>   fpinfo_o->remote_conds);
> break;
>
> default:
> /* Should not happen, we have just check this above */
> elog(ERROR, "unsupported join type %d", jointype);
> }
>
> would break the list fpinfo_i->remote_conds in the case of INNER JOIN or
> FULL JOIN.  You can see the list breakage from e.g., the following queries
> on an Assert-enabled build:
>
> postgres=# create extension postgres_fdw;
> CREATE EXTENSION
> postgres=# create server myserver foreign data wrapper postgres_fdw
> options (dbname 'mydatabase');
> CREATE SERVER
> postgres=# create user mapping for current_user server myserver;
> CREATE USER MAPPING
> postgres=# create foreign table foo (a int) server myserver options
> (table_name 'foo');
> CREATE FOREIGN TABLE
> postgres=# create foreign table bar (a int) server myserver options
> (table_name 'bar');
> CREATE FOREIGN TABLE
> postgres=# create foreign table baz (a int) server myserver options
> (table_name 'baz');
> CREATE FOREIGN TABLE
> postgres=# select * from foo, bar, baz where foo.a = bar.a and bar.a =
> baz.a and foo.a < 10 and bar.a < 10 and baz.a < 10;
>
> Attached is a patch to avoid the breakage.
>
> Best regards,
> Etsuro Fujita
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


foreign_join_ok_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-15 Thread Ashutosh Bapat

During join planning, the planner tries multiple combinations of joining
relations, thus the same base or join relation can be part of multiple of
combination. Hence remote_conds or joinclauses will get linked multiple
times as they are bidirectional lists, thus breaking linkages of previous
join combinations tried. E.g. while planning A join B join C join D planner
will come up with combinations like A(B(CD)) or (AB)(CD) or ((AB)C)D etc.
and remote_conds from A will first be linked into A(B(CD)), then AB
breaking the first linkages.

On Tue, Feb 16, 2016 at 11:36 AM, Etsuro Fujita  wrote:

> On 2016/02/15 21:33, Ashutosh Bapat wrote:
>
>> Here's patch with better way to fix it. I think while concatenating the
>> lists, we need to copy the lists being appended and in all the cases. If
>> we don't copy, a change in those lists can cause changes in the upward
>> linkages and thus lists of any higher level joins.
>>
>
> Maybe I'm missing something, but I don't understand why such a change in
> those lists happens.  Could you explain about that in more detail?
>
> Best regards,
> Etsuro Fujita
>
>
>

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-15 Thread Ashutosh Bapat

On Tue, Feb 16, 2016 at 12:26 PM, Etsuro Fujita  wrote:

> On 2016/02/16 15:22, Ashutosh Bapat wrote:
>
>> During join planning, the planner tries multiple combinations of joining
>> relations, thus the same base or join relation can be part of multiple
>> of combination. Hence remote_conds or joinclauses will get linked
>> multiple times as they are bidirectional lists, thus breaking linkages
>> of previous join combinations tried. E.g. while planning A join B join C
>> join D planner will come up with combinations like A(B(CD)) or (AB)(CD)
>> or ((AB)C)D etc. and remote_conds from A will first be linked into
>> A(B(CD)), then AB breaking the first linkages.
>>
>
> Exactly, but I don't think that that needs to be considered because we
> have this at the beginning of postgresGetGForeignJoinPaths:
>
> /*
>  * Skip if this join combination has been considered already.
>  */
> if (joinrel->fdw_private)
> return;
>
>
There will be different joinrels for A(B(CD)) and (AB) where A's
remote_conds need to be pulled up. The check you have mentioned above only
protects us from adding paths multiple times to (AB) when we encounter it
for (AB)(CD) and ((AB)C)D.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] GetExistingLocalJoinPath() vs. the docs

2016-02-16 Thread Ashutosh Bapat

On Mon, Feb 15, 2016 at 9:11 PM, Stephen Frost  wrote:

> Greetings,
>
> While getting back into the thread Re: Optimization for updating foreign
> tables in Postgres FDW, I noticed some issues with the docs and
> GetExistingLocalJoinPath():
>
> GetExistingLocalJoinPath() exists in src/backend/foreign/foreign.c, but
> the docs include discussion of it under 54.2 - Foreign Data Wrapper
> Callback Routines.  Shouldn't this be under 54.3. Foreign Data Wrapper
> Helper Functions?


Yes


> Also, the prototype is incorrect in the docs (it
> doesn't return a void)


Thanks for pointing that out.


> and the paragraph about what it's for could do
> with some wordsmithing.
>

Any specific suggestions?


>
> A link from 54.2 to 54.3 which mentions it would be fine, of course.
>

Ok

PFA patch fixing those things.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


geljp_doc.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Pushing down sorted joins

2016-02-17 Thread Ashutosh Bapat

Hi All,
Now that we have join pushdown support in postgres_fdw, we can leverage the
sort pushdown mechanism for base relations to work for pushed down joins as
well. PFA patch for the same.

The code to find useful set of pathkeys and then generate paths for each
list of pathkeys is moved into a function which is called for base
relations and join relations, while creating respective paths. The useful
pathkeys are same as the base relation i.e. root->query_pathkeys and
pathkeys useful for merge join as discussed in [1].

I measured performance of pushing down sort for merge joins for query
SELECT lt1.val, ft1.val, ft2.val FROM lt1 join (ft1 join ft2 on (ft1.val =
ft2.val)) on (lt1.val = ft1.val) where ft1, ft2 are foreign tables, join
between which gets pushed down to the foreign server and lt is the local
table.

Without the patch servers prefers local merge join between foreign tables
followed by merge join with local table by getting the data sorted from the
foreign server. But with the patch, it pushes down the foreign join and
also gets the data sorted for local merge join. The times measured over 10
runs of query with and without patch are

With patch
 avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--+--+--+--
   60310.0369 | 251.075471210925 |59895.064 |60746.496

Without patch
 avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--+--+--+--
   86396.6001 |  254.30988131848 |85906.606 |86742.311

With the patch the execution time of the query reduces by 30%.

The scripts to setup and run query and outputs of running query with and
without patch are attached.


[1]
http://www.postgresql.org/message-id/CAFjFpRfeKHiCmwJ72p4=zvuzrqsau9tbfyw7vwr-5ppvrcb...@mail.gmail.com
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_join_sort_pd.patch
Description: application/download
\set num_samples 10
\set query 'SELECT lt1.val, ft1.val, ft2.val FROM lt1 join (ft1 join ft2 on (ft1.val = ft2.val)) on (lt1.val = ft1.val)'
EXPLAIN (VERBOSE, ANALYSE) :query ;
SELECT avg_exe_time, std_dev_exe_time, min_exe_time, max_exe_time
		FROM query_execution_stats(:'query', :num_samples);
\set num_rows (1000*1000*10)
-- Create local tables (to be pointed by foreign tables)
DROP TABLE lt1 CASCADE;
CREATE TABLE lt1(val int, val2 int);
INSERT INTO lt1 SELECT i, i FROM (SELECT generate_series(1, :num_rows)) s(i);
CREATE INDEX i_lt1_val ON lt1(val);


DROP TABLE lt2 CASCADE;
CREATE TABLE lt2(val int, val2 int);
INSERT INTO lt2 SELECT i, i FROM (SELECT generate_series(1, :num_rows)) s(i);
CREATE INDEX i_lt2_val ON lt2(val);

DROP TABLE lt3 CASCADE;
CREATE TABLE lt3(val int, val2 int);
INSERT INTO lt3 SELECT i, i FROM (SELECT generate_series(1, :num_rows)) s(i);
CREATE INDEX i_lt3_val ON lt3(val);

DROP EXTENSION postgres_fdw CASCADE;
create extension postgres_fdw;
CREATE SERVER pg_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname 'postgres');
CREATE USER MAPPING FOR CURRENT_USER SERVER pg_server;
CREATE FOREIGN TABLE ft1 (val int, val2 int) SERVER pg_server OPTIONS (table_name 'lt1',
use_remote_estimate 'true');
CREATE FOREIGN TABLE ft2 (val int, val2 int) SERVER pg_server OPTIONS (table_name 'lt2',
use_remote_estimate 'true');

DROP FUNCTION query_execution_stats(query text, num_samples int,
	OUT avg_exe_time float,
	OUT exec_time_dev float,
	OUT min_exe_time float,
	OUT max_exe_time float);

CREATE FUNCTION query_execution_stats(query text, num_samples int,
	OUT avg_exe_time float,
	OUT std_dev_exe_time float,
	OUT min_exe_time float,
	OUT max_exe_time float)
RETURNS record LANGUAGE plpgsql AS $$
DECLARE
	plan json;
BEGIN
	CREATE TEMPORARY TABLE query_exe_times(exe_time float); 

	-- Execute query a few times (5% of user specified runs) to warm the cache
	FOR i IN 1 .. num_samples/20 LOOP
		EXECUTE query;
	END LOOP;

	FOR i IN 1 .. num_samples LOOP
		EXECUTE 'EXPLAIN (analyze, format json) ' || query INTO plan;
		INSERT INTO query_exe_times VALUES ((plan->0->'Execution Time')::text::float);
		RAISE NOTICE 'completed % samples', i;
	END LOOP;

	SELECT avg(exe_time), stddev(exe_time), min(exe_time), max(exe_time)
		INTO avg_exe_time, std_dev_exe_time, min_exe_time, max_exe_time
		FROM query_exe_times;

	DROP TABLE query_exe_times;
END;
$$;

ANALYZE ft1;
ANALYZE ft2;
ANALYZE lt1;
ANALYZE lt2;
ANALYZE lt3;


sort_pd.out.without_patch
Description: Binary data


sort_pd.out.with_patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-18 Thread Ashutosh Bapat

On Thu, Feb 18, 2016 at 3:48 PM, Etsuro Fujita 
wrote:

> On 2016/02/16 16:40, Etsuro Fujita wrote:
>
>> On 2016/02/16 16:02, Ashutosh Bapat wrote:
>>
>>> On Tue, Feb 16, 2016 at 12:26 PM, Etsuro Fujita
>>> mailto:fujita.ets...@lab.ntt.co.jp>>
>>> wrote:
>>>
>>
> On 2016/02/16 15:22, Ashutosh Bapat wrote:
>>>
>>
> During join planning, the planner tries multiple combinations of
>>> joining
>>> relations, thus the same base or join relation can be part of
>>> multiple
>>> of combination. Hence remote_conds or joinclauses will get linked
>>> multiple times as they are bidirectional lists, thus breaking
>>> linkages
>>> of previous join combinations tried. E.g. while planning A join
>>> B join C
>>> join D planner will come up with combinations like A(B(CD)) or
>>> (AB)(CD)
>>> or ((AB)C)D etc. and remote_conds from A will first be linked
>>> into
>>> A(B(CD)), then AB breaking the first linkages.
>>>
>>
> Exactly, but I don't think that that needs to be considered because
>>> we have this at the beginning of postgresGetGForeignJoinPaths:
>>>
>>>  /*
>>>   * Skip if this join combination has been considered already.
>>>   */
>>>  if (joinrel->fdw_private)
>>>  return;
>>>
>>
> There will be different joinrels for A(B(CD)) and (AB) where A's
>>> remote_conds need to be pulled up.
>>>
>>
> Agreed.
>>
>
> The check you have mentioned above
>>> only protects us from adding paths multiple times to (AB) when we
>>> encounter it for (AB)(CD) and ((AB)C)D.
>>>
>>
> Sorry, I don't understand this fully.
>>
>
> Another thing I don't really understand is why list_copy is needed in the
> second list_concat for the case of INNER/FULL JOIN or in both list_concats
> for the case of LEFT/RIGHT JOIN, in your patch.  Since list_concat is
> nondestructive of its second argument, I don't think list_copy is needed in
> any such list_concat.  Maybe I'm missing something, though.
>

If the list in the joining relation changes (may be because we appended
parameterized conditions), we would be breaking links on all the upper
relations in the join tree in an unpredictable manner. The problem may not
show up now, but it's an avenue for unrecognizable bugs. So, it's safer to
copy the lists in the state that we want them.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Pushing down sorted joins

2016-02-23 Thread Ashutosh Bapat

Rushabh pointed out that declarations of helper functions
get_useful_ecs_for_relation and get_useful_pathkeys_for_relation() are part
of FDW routines declarations rather than helper function declaration. Since
those functions are related to this patch, the attached patch moves those
declaration in their right place.

On Wed, Feb 17, 2016 at 5:37 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi All,
> Now that we have join pushdown support in postgres_fdw, we can leverage
> the sort pushdown mechanism for base relations to work for pushed down
> joins as well. PFA patch for the same.
>
> The code to find useful set of pathkeys and then generate paths for each
> list of pathkeys is moved into a function which is called for base
> relations and join relations, while creating respective paths. The useful
> pathkeys are same as the base relation i.e. root->query_pathkeys and
> pathkeys useful for merge join as discussed in [1].
>
> I measured performance of pushing down sort for merge joins for query
> SELECT lt1.val, ft1.val, ft2.val FROM lt1 join (ft1 join ft2 on (ft1.val =
> ft2.val)) on (lt1.val = ft1.val) where ft1, ft2 are foreign tables, join
> between which gets pushed down to the foreign server and lt is the local
> table.
>
> Without the patch servers prefers local merge join between foreign tables
> followed by merge join with local table by getting the data sorted from the
> foreign server. But with the patch, it pushes down the foreign join and
> also gets the data sorted for local merge join. The times measured over 10
> runs of query with and without patch are
>
> With patch
>  avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
> --+--+--+--
>60310.0369 | 251.075471210925 |59895.064 |60746.496
>
> Without patch
>  avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
> --+--+--+--
>86396.6001 |  254.30988131848 |85906.606 |86742.311
>
> With the patch the execution time of the query reduces by 30%.
>
> The scripts to setup and run query and outputs of running query with and
> without patch are attached.
>
>
> [1]
> http://www.postgresql.org/message-id/CAFjFpRfeKHiCmwJ72p4=zvuzrqsau9tbfyw7vwr-5ppvrcb...@mail.gmail.com
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_join_sort_pd_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-08-21 Thread Ashutosh Bapat

umeric literal or NULL.
>
> Note that the one can specify PARTITION BY when creating a partition
> itself. That is to allow creating multi-level partitioned tables.
>
>
> 0005-Teach-a-few-places-to-use-partition-check-constraint.patch
>
> A partition's bound implicitly constrains the values that are allowed in
> the partition key of its rows.  The same can be applied to partitions when
> inserting data *directly* into them to make sure that only the correct
> data is allowed in (if a tuple has been routed from the parent, that
> becomes unnecessary). To that end, ExecConstraints() now includes the
> above implicit check constraint in the list of constraints it enforces.
>
> Further, to enable constraint based partition exclusion on partitioned
> tables, the planner code includes in its list of constraints the above
> implicitly defined constraints.  This arrangement is temporary however and
> will be rendered unnecessary when we implement special data structures and
> algorithms within the planner in future versions of this patch to use
> partition metadata more effectively for partition exclusion.
>
> Note that the "constraints" referred to above are not some on-disk
> structures but those generated internally on-the-fly when requested by a
> caller.
>
> 0006-Introduce-a-PartitionTreeNode-data-structure.patch
> 0007-Tuple-routing-for-partitioned-tables.patch
>
> These patches enable routing of tuples inserted into a partitioned table
> to one of its leaf partitions.  It applies to both COPY FROM and INSERT.
> First of these patches introduces a data structure that provides a
> convenient means for the tuple routing code to step down a partition tree
> one level at a time.  The second one modifies copy.c and executor to
> implement actual tuple routing.  When inserting into a partition, its row
> constraints and triggers are applied.  Note that the partition's
> constraints also include the constraints defined on the parent.  This
> arrangements means however that the parent's triggers are not currently
> applied.
>
> Updates are handled like they are now for inheritance sets, however, if an
> update makes a row change partition, an error will be thrown.
>
> 0008-Update-DDL-Partitioning-chapter.patch
>
> This patch updates the partitioning section in the DDL chapter to reflect
> the new methods made available for creating and managing partitioned table
> and its partitions.  Especially considering that it is no longer necessary
> to define CHECK constraints and triggers/rules manually for constraint
> exclusion and tuple routing, respectively.
>
> TODO (in short term):
> * Add more regression tests and docs
> * Add PartitionOptInfo and use it to perform partition pruning more
> effectively (the added infrastructure should also help pairwise joins
> patch proposed by Ashutosh Bapat [1])
> * Fix internal representation of list partition bounds to be more efficient
>
>
> Thanks,
> Amit
>
> [1]
> https://www.postgresql.org/message-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXn
> a-urjm_UY9BqXj%3DEaDTSA%40mail.gmail.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-08-21 Thread Ashutosh Bapat

Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
>   Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>   Remote SQL: SELECT ss1.c1, ss1.c2, ss1.c3, ss1.c4, ss2.c1 FROM
> ((SELECT ctid, ROW("C 1", c2, c3, c4, c5, c6, c7, c8), "C 1", c3 FROM "S
> 1"."T \
> 1") ss1(c1, c2, c3, c4) INNER JOIN (SELECT ROW("C 1", c2, c3, c4, c5, c6,
> c7, c8), "C 1" FROM "S 1"."T 1") ss2(c1, c2) ON (TRUE)) WHERE ((ss1.c3 =
> ss2.\
> c2)) ORDER BY ss1.c4 ASC NULLS LAST, ss1.c3 ASC NULLS LAST
> (6 rows)
>
> I'll add this to the next CF.  Comments are welcome!
>
> Best regards,
> Etsuro Fujita
>
> [1] https://www.postgresql.org/message-id/5710D7E2.7010302%40lab.ntt.co.jp
> [2] https://www.postgresql.org/message-id/b4549406-909f-7d15-dc3
> 4-499835a8f0b3%40lab.ntt.co.jp
> [3] https://www.postgresql.org/message-id/a1fa1c4c-bf96-8ea5-cff
> 5-85b927298e73%40lab.ntt.co.jp
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Declarative partitioning - another take

2016-08-25 Thread Ashutosh Bapat

On Thu, Aug 25, 2016 at 12:22 PM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> On 2016/08/22 13:51, Ashutosh Bapat wrote:
> > The parent-child relationship of multi-level partitioned tables is not
> > retained when creating the AppendRelInfo nodes. We create RelOptInfo
> nodes
> > for all the leaf and intermediate tables. The AppendRelInfo nodes created
> > for these RelOptInfos set the topmost table as the parent of all the leaf
> > and child tables. Since partitioning scheme/s at each level is/are
> > associated with the parent/s at that level, we loose information about
> the
> > immediate parents and thus it becomes difficult to identify which leaf
> node
> > falls where in the partition hierarchy. This stops us from doing any
> > lump-sum partition pruning where we can eliminate all the partitions
> under
> > a given parent-partition if that parent-partition gets pruned. It also
> > restricts partition-wise join technique from being applied to partial
> > partition hierarchy when the whole partitioning scheme of joining tables
> > does not match. Maintaining a RelOptInfo hierarchy should not create
> > corresponding Append (all kinds) plan hierarchy since
> > accumulate_append_subpath() flattens any such hierarchy while creating
> > paths. Can you please consider this point in your upcoming patch?
>
> I agree.  So there seem to be two things here:  a) when expanding a
> partitioned table inheritance set, do it recursively such that resulting
> AppendRelInfos preserve *immediate* parent-child relationship info.

Right.

> b)
> when accumulating append subpaths, do not flatten a subpath that is itself
> an append when ((AppendPath *) subpath)->path.parent is a RelOptInfo with
> non-NULL partitioning info.Is the latter somehow necessary for
> pairwise-join considerations?
>

I don't think you need to do anything in the path creation code for this.
As is it flattens all AppendPath hierarchies whether for partitioning or
inheritance or subqueries. We should leave it as it is.

> I think I can manage to squeeze in (a) in the next version patch and will
> also start working on (b), mainly the part about RelOptInfo getting some
> partitioning info.
>

I am fine with b, where you would include some partitioning information in
RelOptInfo. But you don't need to do what you said in (b) above.

In a private conversation Robert Haas suggested a way slightly different
than what my patch for partition-wise join does. He suggested that the
partitioning schemes i.e strategy, number of partitions and bounds of the
partitioned elations involved in the query should be stored in PlannerInfo
in the form of a list. Each partitioning scheme is annotated with the
relids of the partitioned relations. RelOptInfo of the partitioned relation
will point to the partitioning scheme in PlannerInfo. Along-with that each
RelOptInfo will need to store partition keys for corresponding relation.
This simplifies matching the partitioning schemes of the joining relations.
Also it reduces the number of copies of partition bounds floating around as
we expect that a query will involve multiple partitioned tables following
similar partitioning schemes. May be you want to consider this idea while
working on (b).

> Thanks,
> Amit
>
>
>

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-08-25 Thread Ashutosh Bapat

On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada 
wrote:

> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale 
> wrote:
> > Hi All,
> >
> > Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
> > across multiple foreign servers.
> > If a transaction make changes to more than two foreign servers the
> current
> > implementation in postgres_fdw doesn't make sure that either all of them
> > commit or all of them rollback their changes.
> >
> > We (Masahiko Sawada and me) reopen this thread and trying to contribute
> in
> > it.
> >
> > 2PC for FDW
> > 
> > The patch provides support for atomic commit for transactions involving
> > foreign servers. when the transaction makes changes to foreign servers,
> > either all the changes to all the foreign servers commit or rollback.
> >
> > The new patch 2PC for FDW include the following things:
> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can involve
> in
> > the transaction.
> >
> > Currently we can push some conditions down to shard nodes, especially in
> 9.6
> > the directly modify feature has
> > been introduced. But such a transaction modifying data on shard node is
> not
> > executed surely.
> > Using 0002 patch, that modify is executed with 2PC. It means that we
> almost
> > can provide sharding solution using
> > multiple PostgreSQL server (one parent node and several shared node).
> >
> > For multi master, we definitely need transaction manager but transaction
> > manager probably can use this 2PC for FDW feature to manage distributed
> > transaction.
> >
> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
> >
> > 0002 patch makes postgres_fdw to use below APIs. These APIs are generic
> > features which can be used by all kinds of FDWs.
> >
> > a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead of
> > COMMIT/ABORT on foreign server which supports 2PC.
> > b. Manage information of foreign prepared transactions resolver
> >
> > Masahiko Sawada will post the patch.
> >
> >
>
>
Thanks Vinayak and Sawada-san for taking this forward and basing your work
on my patch.


> Still lot of work to do but attached latest patches.
> These are based on the patch Ashutosh posted before, I revised it and
> divided into two patches.
> Compare with original patch, patch of pg_fdw_xact_resolver and
> documentation are lacked.
>

I am not able to understand the last statement.

Do you mean to say that your patches do not have pg_fdw_xact_resolver() and
documentation that my patches had?

OR

you mean to say that my patches did not have (lacked)
pg_fdw_xact_resolver() and documenation

OR some combination of those?
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-08-25 Thread Ashutosh Bapat

On Fri, Aug 26, 2016 at 11:37 AM, Masahiko Sawada 
wrote:

> On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat
>  wrote:
> >
> >
> > On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada  >
> > wrote:
> >>
> >> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale 
> >> wrote:
> >> > Hi All,
> >> >
> >> > Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
> >> > across multiple foreign servers.
> >> > If a transaction make changes to more than two foreign servers the
> >> > current
> >> > implementation in postgres_fdw doesn't make sure that either all of
> them
> >> > commit or all of them rollback their changes.
> >> >
> >> > We (Masahiko Sawada and me) reopen this thread and trying to
> contribute
> >> > in
> >> > it.
> >> >
> >> > 2PC for FDW
> >> > 
> >> > The patch provides support for atomic commit for transactions
> involving
> >> > foreign servers. when the transaction makes changes to foreign
> servers,
> >> > either all the changes to all the foreign servers commit or rollback.
> >> >
> >> > The new patch 2PC for FDW include the following things:
> >> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
> >> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can
> involve
> >> > in
> >> > the transaction.
> >> >
> >> > Currently we can push some conditions down to shard nodes, especially
> in
> >> > 9.6
> >> > the directly modify feature has
> >> > been introduced. But such a transaction modifying data on shard node
> is
> >> > not
> >> > executed surely.
> >> > Using 0002 patch, that modify is executed with 2PC. It means that we
> >> > almost
> >> > can provide sharding solution using
> >> > multiple PostgreSQL server (one parent node and several shared node).
> >> >
> >> > For multi master, we definitely need transaction manager but
> transaction
> >> > manager probably can use this 2PC for FDW feature to manage
> distributed
> >> > transaction.
> >> >
> >> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
> >> >
> >> > 0002 patch makes postgres_fdw to use below APIs. These APIs are
> generic
> >> > features which can be used by all kinds of FDWs.
> >> >
> >> > a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead
> of
> >> > COMMIT/ABORT on foreign server which supports 2PC.
> >> > b. Manage information of foreign prepared transactions resolver
> >> >
> >> > Masahiko Sawada will post the patch.
> >> >
> >> >
> >>
> >
> > Thanks Vinayak and Sawada-san for taking this forward and basing your
> work
> > on my patch.
> >
> >>
> >> Still lot of work to do but attached latest patches.
> >> These are based on the patch Ashutosh posted before, I revised it and
> >> divided into two patches.
> >> Compare with original patch, patch of pg_fdw_xact_resolver and
> >> documentation are lacked.
> >
> >
> > I am not able to understand the last statement.
>
> Sorry to confuse you.
>
> > Do you mean to say that your patches do not have pg_fdw_xact_resolver()
> and
> > documentation that my patches had?
>
> Yes.
> I'm confirming them that your patches had.
>

Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve
any transactions which can not be resolved immediately after they were
prepared. There was a comment from Kevin (IIRC) that leaving transactions
unresolved on the foreign server keeps the resources locked on those
servers. That's not a very good situation. And nobody but the initiating
server can resolve those. That functionality is important to make it a
complete 2PC solution. So, please consider it to be included in your first
set of patches.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Declarative partitioning - another take

2016-09-01 Thread Ashutosh Bapat

>
> > I don't think you need to do anything in the path creation code for this.
> > As is it flattens all AppendPath hierarchies whether for partitioning or
> > inheritance or subqueries. We should leave it as it is.
>
> I thought it would be convenient for pairwise join code to work with the
> hierarchy intact even within the AppendPath tree.  If it turns out to be
> so, maybe that patch can take care of it.
>

Partition-wise join work with RelOptInfos, so it's fine if the AppendPath
hierarchy is flattened out. We need the RelOptInfo hierarchy though.


>
> >> I think I can manage to squeeze in (a) in the next version patch and
> will
> >> also start working on (b), mainly the part about RelOptInfo getting some
> >> partitioning info.
> >
> > I am fine with b, where you would include some partitioning information
> in
> > RelOptInfo. But you don't need to do what you said in (b) above.
> >
> > In a private conversation Robert Haas suggested a way slightly different
> > than what my patch for partition-wise join does. He suggested that the
> > partitioning schemes i.e strategy, number of partitions and bounds of the
> > partitioned elations involved in the query should be stored in
> PlannerInfo
> > in the form of a list. Each partitioning scheme is annotated with the
> > relids of the partitioned relations. RelOptInfo of the partitioned
> relation
> > will point to the partitioning scheme in PlannerInfo. Along-with that
> each
> > RelOptInfo will need to store partition keys for corresponding relation.
> > This simplifies matching the partitioning schemes of the joining
> relations.
> > Also it reduces the number of copies of partition bounds floating around
> as
> > we expect that a query will involve multiple partitioned tables following
> > similar partitioning schemes. May be you want to consider this idea while
> > working on (b).
>
> So IIUC, a partitioned relation's (baserel or joinrel) RelOptInfo has only
> the information about partition keys.  They will be matched with query
> restriction quals pruning away any unneeded partitions which happens
> individually for each such parent baserel (within set_append_rel_size() I
> suppose).  Further, two joining relations are eligible to be considered
> for pairwise joining if they have identical partition keys and query
> equi-join quals match the same.  The resulting joinrel will have the same
> partition key (as either joining relation) and will have as many
> partitions as there are in the intersection of sets of partitions of
> joining rels (intersection proceeds by matching partition bounds).
>
> "Partition scheme" structs go into a PlannerInfo list member, one
> corresponding to each partitioned relation - baserel or joinrel, right?
>

Multiple relations (base or join) can share Partition Scheme if they are
partitioned the same way. Each partition scheme also stores the relids of
the base relations partitioned by that scheme.


> As you say, each such struct has the following pieces of information:
> strategy, num_partitions, bounds (and other auxiliary info).  Before
> make_one_rel() starts, the list has one for each partitioned baserel.
>
After make_one_rel() has formed baserel pathlists and before
> make_rel_from_joinlist() is called, are the partition scheme structs of
> processed baserels marked with some information about the pruning activity
> that occurred so far?


Right now pruned partitions are labelled as dummy rels (empty appent
paths). That's enough to detect a pruned partition. I haven't found a need
to label partitioning scheme with pruned partitions for partition-wise join.


> Then as we build successively higher levels of
> joinrels, new entries will be made for those joinrels for which we added
> pairwise join paths, with relids matching the corresponding joinrels.
> Does that make sense?
>
>
I don't think we will make any new partition scheme entry in PlannerInfo
after all the base relations have been considered. Partitionin-wise join
will pick the one suitable for the given join. But in case partition-wise
join needs to make new entries, I will take care of that in my patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Declarative partitioning - another take

2016-09-01 Thread Ashutosh Bapat

t a time.  The second one modifies copy.c and executor to
> implement actual tuple routing.  When inserting into a partition, its row
> constraints and triggers are applied.  Note that the partition's
> constraints also include the constraints defined on the parent.  This
> arrangements means however that the parent's triggers are not currently
> applied.
>
> Updates are handled like they are now for inheritance sets, however, if an
> update makes a row change partition, an error will be thrown.
>
> 0008-Update-DDL-Partitioning-chapter.patch
>
> This patch updates the partitioning section in the DDL chapter to reflect
> the new methods made available for creating and managing partitioned table
> and its partitions.  Especially considering that it is no longer necessary
> to define CHECK constraints and triggers/rules manually for constraint
> exclusion and tuple routing, respectively.
>
> TODO (in short term):
> * Add more regression tests and docs
> * Add PartitionOptInfo and use it to perform partition pruning more
> effectively (the added infrastructure should also help pairwise joins
> patch proposed by Ashutosh Bapat [1])
> * Fix internal representation of list partition bounds to be more efficient
>
>
> Thanks,
> Amit
>
> [1]
> https://www.postgresql.org/message-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXn
> a-urjm_UY9BqXj%3DEaDTSA%40mail.gmail.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Declarative partitioning - another take

2016-09-01 Thread Ashutosh Bapat

>
>
> > 2. A combination of constraints on the partitions should be applicable to
> > the parent. We aren't doing that.
>
> How about on seeing that a RELOPT_OTHER_MEMBER_REL is partitioned parent
> table, we can have get_relation_constraints() include a constant false
> clause in the list of constraints returned for
> relation_excluded_by_constraints() to process so that it is not included
> in the append result by way of constraint exclusion.  One more option is
> to mark such rels dummy in set_rel_size().
>
>
I am not complaining about having parent relation there. For the people who
are used to seeing the parent relation in the list of append relations, it
may be awkward. But +1 if we can do that. If we can't do that, we should at
least mark with an OR of all constraints on the partitions, so that
constraint exclusion can exclude it if there are conditions incompatible
with constraints. This is what would happen in inheritance case as well, if
there are constraints on the parent. In the above example, the parent table
would have constraints CHECK ((a >= 0 AND a < 250) OR (a >= 250 and a <
500) OR (a >= 500 or a < 600)). It will probably get excluded, if
constraint exclusion is smart enough to understand ORing.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Declarative partitioning - another take

2016-09-01 Thread Ashutosh Bapat

On Fri, Sep 2, 2016 at 12:23 PM, Amit Langote  wrote:

> On 2016/09/02 15:22, Ashutosh Bapat wrote:
> >>
> >>
> >>> 2. A combination of constraints on the partitions should be applicable
> to
> >>> the parent. We aren't doing that.
> >>
> >> How about on seeing that a RELOPT_OTHER_MEMBER_REL is partitioned parent
> >> table, we can have get_relation_constraints() include a constant false
> >> clause in the list of constraints returned for
> >> relation_excluded_by_constraints() to process so that it is not
> included
> >> in the append result by way of constraint exclusion.  One more option is
> >> to mark such rels dummy in set_rel_size().
> >>
> >>
> > I am not complaining about having parent relation there. For the people
> who
> > are used to seeing the parent relation in the list of append relations,
> it
> > may be awkward. But +1 if we can do that. If we can't do that, we should
> at
> > least mark with an OR of all constraints on the partitions, so that
> > constraint exclusion can exclude it if there are conditions incompatible
> > with constraints. This is what would happen in inheritance case as well,
> if
> > there are constraints on the parent. In the above example, the parent
> table
> > would have constraints CHECK ((a >= 0 AND a < 250) OR (a >= 250 and a <
> > 500) OR (a >= 500 or a < 600)). It will probably get excluded, if
> > constraint exclusion is smart enough to understand ORing.
>
> I guess constraint exclusion would be (is) smart enough to handle that
> correctly but why make it unnecessarily spend a *significant* amount of
> time on doing the proof (when we *know* we can just skip it).  Imagine how
> long the OR list could get.  By the way, my suggestion of just returning a
> constant false clause also does not work - neither in case of a query with
> restrictions (predicate has to be an OpExpr to go ahead with the proof
> which constant false clause is not) nor in case of a query without
> restrictions (proof is not performed at all).  So, that one is useless.
>

Huh!


>
> Getting rid of the parent table in the append list by other means may be a
> way to go.  We know that the table is empty and safe to just drop.
>
>
Ok. Does a constraint (1 = 2) or something like that which is obviously
false, help?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Declarative partitioning - another take

2016-09-06 Thread Ashutosh Bapat

ve of parent's, selecting a parent's NOT NULL
> column might return nulls from the child table that no longer has the
> constraint.
>
> I recently came across a related proposal whereby dropping *inherited* NOT
> NULL  from child tables will be prevented.  Problems in letting it be be
> dropped are mentioned here:
>
> https://www.postgresql.org/message-id/21633.1448383...@sss.pgh.pa.us
>
> That proposal is probably being reworked such that NOT NULL constraints
> get a pg_constraint entry with proper accounting of inheritance count.
>
> > +ERROR:  new partition's list of values overlaps with partition
> > "lpart1" of "list_parted"
> >
> > Maybe just:
> >
> > ERROR: partitions must not overlap
> > -or-
> > ERROR: partition "%s" would overlap partition "%s"
>
> OK, I changed to the second message.
>
> >
> > As before, this is just an initial read-through, so apologies for
> > whatever I may have missed.
>
> Thanks a lot for the review.
>
> By the way, I am still working on the following items and will be included
> in the next version of the patch.
>
> * Fix internal representation of list partition bounds to be more efficient
> * Add PartitionOptInfo
>
> As mentioned in [2], I have refactored the inheritance expansion code
> within optimizer so that a partitioned table's inheritance hierarchy is
> preserved in resulting AppendRelInfos (patch 0005).  One immediate benefit
> of that is that if constraint exclusion determines that an intermediate
> partition is to be excluded then all partitions underneath it are excluded
> automatically.  With the current flattened model, all partitions in that
> subtree would have been processed and excluded one-by-one.
>
> Regards,
> Amit
>
> [1]
> https://www.postgresql.org/message-id/CA%2BTgmoZ008qTgd_Qg6_
> oZb3i0mOYrS6MdhncwgcqPKahixjarg%40mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/f2a9592a-17e9-4c6a-
> e021-03b802195ce7%40lab.ntt.co.jp
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 59f2127..f17ac29 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -281,7 +281,7 @@ RelationBuildPartitionKey(Relation relation)
 	oidvector	   *opclass;
 	KeyTypeCollInfo *tcinfo;
 	ListCell	   *partexprbin_item;
-	List		   *partexprsrc;
+	List		   *partexprsrc = NIL;
 	ListCell	   *partexprsrc_item;
 	Datum			datum;
 	MemoryContext	partkeycxt,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-09-06 Thread Ashutosh Bapat

This patch uses RangeBound structure. There's also a structure defined with
the same name in rangetypes.h with some slight differences. Should we
rename the one in partition.c as PartRangeBound or something like that to
avoid the confusion?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-09-06 Thread Ashutosh Bapat

On Fri, Sep 2, 2016 at 3:55 PM, Etsuro Fujita 
wrote:

> Hi Ashutosh,
>
> On 2016/08/22 15:49, Ashutosh Bapat wrote:
>
>> 1. deparsePlaceHolderVar looks odd - each of the deparse* function is
>> named as deparse + > into>. PlaceHolderVar is not a parser node, so no string is going to be
>> parsed as a PlaceHolderVar. May be you want to name it as
>> deparseExerFromPlaceholderVar or something like that.
>>
>
> The name "deparseExerFromPlaceholderVar" seems long to me.  How about
> "deparsePlaceHolderExpr"?
>

There isn't any node with name PlaceHolderExpr.

> 2. You will need to check phlevelsup member while assessing whether a
>> PHV is safe to push down.
>>
>
> Good catch!  In addition to that, I added another check that the eval_at
> set for the PHV should be included in the relids set for the foreign
> relation.  I think that would make the shippability check more robust.
>

Thanks.

>
> 3. I think registerAlias stuff should happen really at the time of
>> creating paths and should be stored in fpinfo. Without that it will be
>> computed every time we deparse the query. This means every time we try
>> to EXPLAIN the query at various levels in the join tree and once for the
>> query to be sent to the foreign server.
>>
>
> Hmm.  I think the overhead in calculating aliases would be negligible
> compared to the overhead in explaining each remote query for costing or
> sending the remote query for execution.  So, I created aliases in the same
> way as remote params created and stored into params_list in
> deparse_expr_cxt.  I'm not sure it's worth complicating the code.
>

We defer remote parameter creation till deparse time since the the
parameter numbers are dependent upon the sequence in which we deparse the
query. Creating them at the time of path creation and storing them in
fpinfo is not possible because a. those present in the joining relations
will conflict with each other and need some kind of serialization at the
time of deparsing b. those defer for differently parameterized paths or
paths with different pathkeys. We don't defer it because it's very light on
performance.

That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique aliases
e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
alias r123 without conflicting with any other alias.

However minimum overhead it might have, we will deparse the query every
time we create a path involving that kind of relation i.e. for different
pathkeys, different parameterization and different joins in which that
relation participates in. This number can be significant. We want to avoid
this overhead if we can.

>
> 5. The blocks related to inner and outer relations in
>> deparseFromExprForRel() look same. We should probably separate that code
>> out into a function and call it at two places.
>>
>
> Done.
>

Thanks. I see you have created function deparseOperandRelation() for the
same. I guess, this should be renamed as deparseRangeTblRef() since it will
create RangeTblRef node when parsed back.

>
> 6.
>> ! if (is_placeholder)
>> ! errcontext("placeholder expression at position %d in select
>> list",
>> !errpos->cur_attno);
>> A user wouldn't know what a placeholder expression is, there is no such
>> term in the documentation. We have to device a better way to provide an
>> error context for an expression in general.
>>
>
> Though I proposed that, I don't think that it's that important to let
> users know that the expression is from a PHV.  How about just saying
> "expression", not "placeholder expression", so that we have the message
> "expression at position %d in select list" in the context?
>
> Hmm, that's better than placeholder expression, but not as explanatory as
it should be since we won't be printing the "select list" in the error
context and user won't have a clue as to what error context is about. But I
don't have any good suggestions right now. May be we should print the whole
expression? But that can be very long, I don't know.

This patch tries to do two things at a time 1. support join pushdown for
FULL join when the joining relations have remote conditions 2. better
support for fetching placeholder vars, whole row references and some system
columns. To make reviews easy, I think we should split the patch into two
1. supporting subqueries to be deparsed with support for one of the above
(I will suggest FULL join support) 2. support for the other.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2016-09-06 Thread Ashutosh Bapat

Thanks Fujita-san for working on this.


> * with the patch:
> postgres=# explain verbose delete from ft1 using ft2 where ft1.a = ft2.a;
>  QUERY PLAN
> 
> -
>  Delete on public.ft1  (cost=100.00..102.04 rows=1 width=38)
>->  Foreign Delete  (cost=100.00..102.04 rows=1 width=38)
>  Remote SQL: DELETE FROM public.t1 r1 USING (SELECT ROW(a, b), a
> FROM public.t2) ss1(c1, c2) WHERE ((r1.a = ss1.c2))
> (3 rows)
>
> The WIP patch has been created on top of the join pushdown patch [1]. So,
> for testing, please apply the patch in [1] first.
>
>
The underlying scan on t2 requires ROW(a,b) for locking the row for
update/share. But clearly it's not required if the full query is being
pushed down. Is there a way we can detect that ROW(a,b) is useless column
(not used anywhere in the other parts of the query like RETURNING, DELETE
clause etc.) and eliminate it? Similarly for a, it's part of the targetlist
of the underlying scan so that the WHERE clause can be applied on it. But
it's not needed if we are pushing down the query. If we eliminate the
targetlist of the query, we could construct a remote query without having
subquery in it, making it more readable.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

2016-09-08 Thread Ashutosh Bapat

roup reference in deparseAggOrderBy() and
appendGroupByClause() looks similar. Should we extract it out into a
function
by itself and call that function in those two places similar to
get_rule_sortgroupclause()?

17. In postgresGetForeignPlan() the code to extract conditions and
targetlist
is duplicated for join and upper relation. Attached patch removes this
duplication. Since DML, FOR SHARE/UPDATE is not allowed with aggregation and
grouping, we should get an outer plan in that code.

18. estimate_path_cost_size() has grown quite long (~400 lines). Half of
that
function deals with estimating costs locally. Should we split this function
into smaller functions, one for estimating size and costs of each kind of
relations locally?

19. Condition below
+if (sgref && query->groupClause && query->groupingSets == NIL &&
+get_sortgroupref_clause_noerr(sgref, query->groupClause) !=
NULL)
can be rewritten as
if (sgref && get_sortgroupref_clause_noerr(sgref, query->groupClause))
since we won't come here with non-NULL query->groupingSets and
get_sortgroupref_clause_noerr() will return NULL, if there aren't any
groupClause.

20. Please use word "foreign" instead of "remote" in comments.

21. This code looks dubious
+/* If not GROUP BY ref, reset it as we are not pushing those */
+if (sgref)
+grouping_target->sortgrouprefs[i] = 0;
grouping_target comes from RelOptInfo, which is being modified here. We
shouldn't be modifying anything in the RelOptInfo while checking whether the
aggregate/grouping is shippable. You may want to copy the pathtaget and
modify
the copy.

22. Following comment needs more elaboration.
+/*
+ * Add aggregates, if any, into tlist.  Plain Var nodes
pulled
+ * are already part of GROUP BY and thus no need to add
them
+ * explicitly.
+ */
Plain Var nodes will either be same as some GROUP BY expression or should be
part of some GROUP BY expression. In the later case, the query can not refer
those without the surrounding expression. which will be part of the
targetlist
list. Hence we don't need to add plain Var nodes in the targetlist. In fact
adding those in SELECT clause will cause error on the foreign server if they
are not part of GROUP BY clause.

23. Probably you don't need to save this in fpinfo. You may want to
calculate
it whenever it's needed.
+fpinfo->grouped_exprs = get_sortgrouplist_exprs(query->groupClause,
tlist);

24. Can this ever happen for postgres_fdw? The core sets fdwroutine and
calls
this function when the underlying scan relation has fdwroutine set, which
implies that it called corresponding GetForeignPath hooks, which should have
set fdw_private. Nonetheless, I think, still the condition is useful to
avoid
crashing the server in case the fdw_private is not set. But we need better
comments.
+/* If input rel is not aware of fdw, simply return */
+if (!input_rel->fdw_private)
+return;

25. Although it's desirable to get a switch case in
postgresGetForeignUpperPaths() eventually, for this patch I guess we should
have an if condition there.

26. Attached patch has restructured code in appendGroupByClause() to reduce
indentation and make the code readable. Let me know if this looks good to
you.

27. Prologue of create_foreign_grouping_paths() does not have formatting
similar to the surrounding functions. Please fix it. Since the function
"create"s and "add"s paths, it might be better to rename it as
add_foreign_grouping_paths().

28. Isn't the following true for any upper relation and shouldn't it be
part of
postgresGetForeignUpperPaths(), rather than create_foreign_grouping_paths()?
+/*
+ * If input rel is not safe to pushdown, we cannot do grouping and/or
+ * aggregation on it.
+ */
+if (!ifpinfo || !ifpinfo->pushdown_safe)
+return;

29. We require RelOptInfo::relids since create_foreignscan_plan() copies
them
into ForeignPlan::fs_relids and executor uses them. So, I guess, we have to
set
those in the core somewhere. May be while calling fetch_upper_rel() in
grouping_planner(), we should call it with relids of the underlying scan
relation. I don't see any function calling fetch_upper_rel() with non-NULL
relids. In fact, if we are to create an upper relation with relids in
future,
this code is going to wipe that out, which will be undesirable. Also,
generally, when copying relids, it's better to use bms_copy() to make a copy
of Bitmapset, instead of just assigning the pointer.

30. By the time postgresGetForeignUpperPaths() gets called, the core has
already added its own paths, so it doesn't make much sense to set rows and
width grouped_rel in create_foreign_grouping_paths().

31. fpinfo->server and user fields are being set twice, on

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-09-12 Thread Ashutosh Bapat

x27;t be printing the "select list" in the
>> error context and user won't have a clue as to what error context is
>> about.
>>
>
> I don't think so.  Consider an example of the conversion error message,
> which is from the regression test:
>
> SELECT  ft1.c1,  ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND
> ft1.c1 = 1;
> ERROR:  invalid input syntax for integer: "foo"
> CONTEXT:  whole-row reference to foreign table "ft1"
>
> As shown in the example, the error message is displayed under a remote
> query for execution.  So, ISTM it's reasonable to print something like
> "expression at position %d in select list" in the context if an expression
> in a PHV.
>

I missed it. Sorry. Looks ok.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Nested loop join condition does not get pushed down to foreign scan

2016-09-13 Thread Ashutosh Bapat

On Tue, Sep 13, 2016 at 4:05 PM, Albe Laurenz  wrote:
> I just noticed something surprising:
>
> -- create a larger local table
> CREATE TABLE llarge (id integer NOT NULL, val integer NOT NULL);
> INSERT INTO llarge SELECT i, i%100 FROM generate_series(1, 1) i;
> ALTER TABLE llarge ADD PRIMARY KEY (id);
>
> -- create a small local table
> CREATE TABLE small (id integer PRIMARY KEY, val text NOT NULL);
> INSERT INTO small VALUES (1, 'one');
>
> -- create a foreign table based on llarge
> CREATE EXTENSION postgres_fdw;
> CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 
> 'localhost', port '5432', dbname 'test');
> CREATE USER MAPPING FOR myself SERVER loopback OPTIONS (user 'myself', 
> password 'mypassword');
> CREATE FOREIGN TABLE rlarge (id integer NOT NULL, val integer NOT NULL) 
> SERVER loopback OPTIONS (table_name 'llarge');
>
> SET enable_hashjoin = off;
> -- plan for a nested loop join with a local table
> EXPLAIN (COSTS off) SELECT * FROM small JOIN llarge USING (id);
>   QUERY PLAN
> --
>  Nested Loop
>->  Seq Scan on small
>->  Index Scan using llarge_pkey on llarge
>  Index Cond: (id = small.id)
> (4 rows)
>
> -- plan for a nested loop join with a foreign table
> EXPLAIN (COSTS off) SELECT * FROM small JOIN rlarge USING (id);
>   QUERY PLAN
> ---
>  Nested Loop
>Join Filter: (small.id = rlarge.id)
>->  Seq Scan on small
>->  Foreign Scan on rlarge
> (4 rows)
>
>
> Is there a fundamental reason why the join condition does not get pushed down 
> into
> the foreign scan or is that an omission that can easily be fixed?
>

While creating the foreign table, if you specify use_remote_estimate =
true for the table OR do so for the foreign server, postgres_fdw
creates parameterized paths for that foreign relation. If using a
parameterized path reduces cost of the join, it will use a nested loop
join with inner relation parameterized by the outer relation, pushing
join conditions down into the foreign scan.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-09-13 Thread Ashutosh Bapat

On Tue, Sep 13, 2016 at 10:28 PM, Robert Haas  wrote:
> On Tue, Sep 6, 2016 at 9:07 AM, Ashutosh Bapat
>  wrote:
>> That's not true with the alias information. As long as we detect which
>> relations need subqueries, their RTIs are enough to create unique aliases
>> e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
>> alias r123 without conflicting with any other alias.
>
> What if RTI 123 is also used in the query?

Good catch. I actually meant some combination of 1, 2 and 3, which is
unique for a join between r1, r2 and r3.  How about r1_2_3 or
r1_r2_r3?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Printing bitmap objects in the debugger

2016-09-14 Thread Ashutosh Bapat

Hi All,
While working on partition-wise join, I had to examine Relids objects
many times. Printing the Bitmapset::words[] in binary format and then
inferring the relids takes time and is error prone. Instead I wrote a
function bms_to_char() which allocates a StringInfo, calls
outBitmapset() to decode Bitmapset as a set of integers and returns
the string. In order to examine a Relids object all one has to do is
execute 'p bms_to_char(bms_object) under gdb.

Is there a way, this can be included in the code? If it's available in
the code, developers don't have to apply the patch and compile it for
debugging.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 29b7712..b4cae11 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -93,20 +93,22 @@
 	 outNode(str, node->fldname))
 
 /* Write a bitmapset field */
 #define WRITE_BITMAPSET_FIELD(fldname) \
 	(appendStringInfo(str, " :" CppAsString(fldname) " "), \
 	 _outBitmapset(str, node->fldname))
 
 
 #define booltostr(x)  ((x) ? "true" : "false")
 
+char *bms_to_char(const Bitmapset *bms);
+
 
 /*
  * _outToken
  *	  Convert an ordinary string (eg, an identifier) into a form that
  *	  will be decoded back to a plain token by read.c's functions.
  *
  *	  If a null or empty string is given, it is encoded as "<>".
  */
 static void
 _outToken(StringInfo str, const char *s)
@@ -204,20 +206,31 @@ _outBitmapset(StringInfo str, const Bitmapset *bms)
 }
 
 /* for use by extensions which define extensible nodes */
 void
 outBitmapset(StringInfo str, const Bitmapset *bms)
 {
 	_outBitmapset(str, bms);
 }
 
 /*
+ * TODO: remove, used for debugging through gdb.
+ */
+char *
+bms_to_char(const Bitmapset *bms)
+{
+	StringInfo str = makeStringInfo();
+	outBitmapset(str, bms);
+	return str->data;
+}
+
+/*
  * Print the value of a Datum given its type.
  */
 void
 outDatum(StringInfo str, Datum value, int typlen, bool typbyval)
 {
 	Size		length,
 i;
 	char	   *s;
 
 	length = datumGetSize(value, typbyval, typlen);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] Error running custom plugin: “output plugins have to declare the _PG_output_plugin_init symbol”

2016-09-14 Thread Ashutosh Bapat

On Wed, Sep 14, 2016 at 4:03 PM, valeriof  wrote:
> Hi, I'm kind of new to Postgres and I'm trying to create a custom output
> plugin for logical replication (Postgres is 9.5.4 version and I'm building
> the project from a Windows 8/64 bit machine - same machine where the db is
> installed).
> I started from the code of the sample test_decoding, and I was trying to
> simply rebuild it under a new name and install it into Postgres to see if
> the module works. Once done, I will start modifying the code.
>
> My project is built in Visual Studio 2013 and the only steps I took were to
> copy the built assembly under Postgres lib folder. When I run the command:
>
> postgres=# SELECT * FROM
> pg_create_logical_replication_slot('regression_slot', 'my_decoding');
>
> I get an error saying that "output plugins have to declare the
> _PG_output_plugin_init symbol".
>

The error comes from LoadOutputPlugin(), when the call to
load_external_function() fails. load_external_function() tries to
search for given function in the given file. The file name is same as
plugin name. So, it may be that it doesn't find a file with
my_decoding library in the installation. If test_decoding plugin is
working in your case, please check if you can find my_decoding library
in the same location. If not, that's the problem.

> In my code, I have the function declared as extern:
>
> extern void _PG_init(void);
> extern void _PG_output_plugin_init(OutputPluginCallbacks *cb);
>
> and the body of the function is defined somewhere below it.
>
> The actual code is just a copy of the test_decoding sample:
> https://github.com/postgres/postgres/blob/REL9_5_STABLE/contrib/test_decoding/test_decoding.c
>
> I'm not sure if the deployment process is ok (just copying the dll) or if
> there is some other step to take. Can anyone shed some light?
>

It's hard to tell what's wrong exactly, without seeing the changes you
have made. But, it looks like while copying test_decoding directory,
you have forgot to replace some instance/s of test_decoding with
my_decoding.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-14 Thread Ashutosh Bapat

On Wed, Sep 14, 2016 at 5:31 PM, Pavan Deolasee
 wrote:
>
> On Wed, Sep 14, 2016 at 3:46 PM, Pavan Deolasee 
> wrote:
>>
>>
>>
>>  lately I'm using LVM debugger (which probably does not have something
>> equivalent),
>
>
> And I was so clueless about lldb's powerful scripting interface. For
> example, you can write something like this in bms_utils.py:
>
> import lldb
>
> def print_bms_members (bms):
> words = bms.GetChildMemberWithName("words")
> nwords = int(bms.GetChildMemberWithName("nwords").GetValue())
>
> ret = 'nwords = {0} bitmap: '.format(nwords,)
> for i in range(0, nwords):
> ret += hex(int(words.GetChildAtIndex(0, lldb.eNoDynamicValues,
> True).GetValue()))
>
> return ret
>

Thanks a lot for digging into it.

> And then do this while attached to lldb debugger:
>
> Process 99659 stopped
> * thread #1: tid = 0x59ba69, 0x0001090b012f
> postgres`bms_add_member(a=0x7fe60a0351f8, x=10) + 15 at bitmapset.c:673,
> queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
> frame #0: 0x0001090b012f
> postgres`bms_add_member(a=0x7fe60a0351f8, x=10) + 15 at bitmapset.c:673
>670 int wordnum,
>671 bitnum;
>672
> -> 673 if (x < 0)
>674 elog(ERROR, "negative bitmapset member not allowed");
>675 if (a == NULL)
>676 return bms_make_singleton(x);
> (lldb) script
> Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
>>>> from bms_utils import *
>>>> bms = lldb.frame.FindVariable ("a")
>>>> print print_bms_members(bms)
> nwords = 1 bitmap: 0x200
>

I can get that kind of output by command
p *bms
p/x (or p/b) *bms->words@(bms->nwords) in gdb.

But I can certainly extend the script you wrote above to print more
meaningful output similar to outBitmapset(). But then this would be
specific to LLVM. GDB too seems to have a similar interface
https://sourceware.org/gdb/wiki/PythonGdbTutorial, so I can probably
use above script with some modifications with GDB as well. Python
script will be easier to maintain as compared to maintaining a patch
that needs to be applied and compiled.

Said that, I am not sure if every debugger supported on every platform
we support has these features. Or may be developers work on only those
platforms which have such powerful debuggers, so it's ok.

Every debugger usually has much easier way to call a function and
print its output, so having a function like the one I have in the
patch makes things easy for all the debuggers and may be developers
not familiar with python.

>
> The complete API reference is available here
> http://lldb.llvm.org/python_reference/index.html
>
> Looks like an interesting SoC project to write useful lldb/gdb scripts to
> print internal structures for ease of debugging :-)
>

+1, if we can include something like that in the repository so as to
avoid every developer maintaining a script of his/her own.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-14 Thread Ashutosh Bapat

>> Alvaro Herrera  writes:
>>> I don't understand.  Why don't you just use "call pprint(the bitmapset)"
>>> in the debugger?
>>
>> Bitmapsets aren't Nodes, so pprint doesn't work directly on them.
>> I usually find that I can pprint some node containing the value(s)
>> I'm interested in, but maybe that isn't working for Ashutosh's
>> particular case.

that's right.
>
> There are many loose (ie, not inside any Node) Relids variables within the
> optimizer code.  Perhaps Ashutosh ended up needing to look at those a lot.

that's right too.

In joinrels.c for example we are manipulating Relids so many times.
[ashutosh@ubuntu pg_head]grep bms_ src/backend/optimizer/path/joinrels.c | wc -l
69
There are many other instances of this in other optimizer and planner
files. There are other places where we manipulate Bitmapsets.

And not every Relids object computed is contained in a Node. So,
pprint() doesn't help much.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-14 Thread Ashutosh Bapat

On Wed, Sep 14, 2016 at 8:45 PM, Alvaro Herrera
 wrote:
> Tom Lane wrote:
>> Ashutosh Bapat  writes:
>> > While working on partition-wise join, I had to examine Relids objects
>> > many times. Printing the Bitmapset::words[] in binary format and then
>> > inferring the relids takes time and is error prone.
>>
>> FWIW, I generally rely on pprint() to look at planner data structures
>> from the debugger.
>
> Also:
> http://blog.pgaddict.com/posts/making-debugging-with-gdb-a-bit-easier
> This may not address the issue directly, but it's probably very helpful.

Thanks for the reference. I think this is something similar to what
Pavan suggested in the mail thread.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-09-14 Thread Ashutosh Bapat

On Wed, Sep 14, 2016 at 8:52 PM, Robert Haas  wrote:
> On Tue, Sep 13, 2016 at 11:38 PM, Ashutosh Bapat
>  wrote:
>> On Tue, Sep 13, 2016 at 10:28 PM, Robert Haas  wrote:
>>> On Tue, Sep 6, 2016 at 9:07 AM, Ashutosh Bapat
>>>  wrote:
>>>> That's not true with the alias information. As long as we detect which
>>>> relations need subqueries, their RTIs are enough to create unique aliases
>>>> e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
>>>> alias r123 without conflicting with any other alias.
>>>
>>> What if RTI 123 is also used in the query?
>>
>> Good catch. I actually meant some combination of 1, 2 and 3, which is
>> unique for a join between r1, r2 and r3.  How about r1_2_3 or
>> r1_r2_r3?
>
> Sure, something like that can work, but if you have a big enough join
> the identifier might get too long.  I'm not sure why it wouldn't work
> to just use the lowest RTI involved in the join, though; the others
> won't appear anywhere else at that query level.
>

Yes, that will work too and is much more preferred than long r1_2_3.
The idea is to come with a unique alias name from RTIs involved in
that relation. Thinking loudly, r1_2_3 is more descriptive to debug
issues. It tells that the subquery it refers to covers RTIs 1, 2 and
3. That information may be quite helpful to understand the remote SQL.
r1 on the other hand can refer to relation with RTI 1 or a join
relation where least RTI is 1. That can be solved by using s for
subquery and r for a bare relation. But it still doesn't tell us
which all relations are involved.

But since the subquery is part of remote SQL and we have to take a
look at it to find out meaning of individual columns, that benefit may
be smaller compared to the convenience of smaller alias. So +1 for
using the smallest RTI to indicate a subquery.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-09-15 Thread Ashutosh Bapat

 | Modifiers | Storage | Stats target | Description
>> +-+---+-+--+-
>>  c1 | integer |   | plain   |  |
>> Partition Key: RANGE (c1)
>> Partitions: test_subpart_p1 FOR VALUES START (1) END (100) INCLUSIVE
>>
>> \d+ test_subpart_p1
>>Table "public.test_subpart_p1"
>>  Column |  Type   | Modifiers | Storage | Stats target | Description
>> +-+---+-+--+-
>>  c1 | integer |   | plain   |  |
>> Partition Of: test_subpart FOR VALUES START (1) END (100) INCLUSIVE
>> Partition Key: RANGE (c1)
>> Partitions: test_subpart_p1_sub1 FOR VALUES START (101) END (200)
>>
>> insert into test_subpart values (50);
>> ERROR:  no partition of relation "test_subpart_p1" found for row
>> DETAIL:  Failing row contains (50).
>> insert into test_subpart values (150);
>> ERROR:  no partition of relation "test_subpart" found for row
>> DETAIL:  Failing row contains (150).
>
> It seems that DDL should prevent the same column being used in partition
> key of lower level partitions.  I don't know how much sense it would make,
> but being able to use the same column as partition key of lower level
> partitions may be a feature useful to some users if they know what they
> are doing.  But this last part doesn't sound like a good thing.  I
> modified the patch such that lower level partitions cannot use columns
> used by ancestor tables.
>
>> -- Observation 3 : Getting cache lookup failed, when selecting list
>> partition table containing array.
>>
>> CREATE TABLE test_array ( i int,j int[],k text[]) PARTITION BY LIST (j);
>> CREATE TABLE test_array_p1 PARTITION OF test_array FOR VALUES IN ('{1}');
>> CREATE TABLE test_array_p2 PARTITION OF test_array FOR VALUES IN ('{2,2}');
>>
>> INSERT INTO test_array (i,j[1],k[1]) VALUES (1,1,1);
>> INSERT INTO test_array (i,j[1],j[2],k[1]) VALUES (2,2,2,2);
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array_p1;
>>tableoid| i |  j  |  k
>> ---+---+-+-
>>  test_array_p1 | 1 | {1} | {1}
>> (1 row)
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array_p2;
>>tableoid| i |   j   |  k
>> ---+---+---+-
>>  test_array_p2 | 2 | {2,2} | {2}
>> (1 row)
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array;
>> ERROR:  cache lookup failed for type 0
>
> That's a bug.  Fixed in the attached patch.
>
> PS: I'm going to have limited Internet access during this weekend and over
> the next week, so responses could be slow.  Sorry about that.
>
> Thanks,
> Amit



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-15 Thread Ashutosh Bapat

On Thu, Sep 15, 2016 at 9:15 PM, Alex Ignatov  wrote:
> Hello!
> Does parallel secscan works in plpgsql?
>

Parallel seq scan is a query optimization that will work independent
of the source of the query - i.e whether it comes directly from a
client or a procedural language like plpgsql. So, I guess, answer to
your question is yes. If you are expecting something else, more
context will help.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-16 Thread Ashutosh Bapat

>
> I'd suggest that this is parallel to nodeToString() and therefore
> (a) should be placed beside it,

Done. Added it after nodeToString().

> (b) should be named like it,
> bmsToString() perhaps,

bmsToString() is fine. Used that name.

> and (c) should look more like it internally.
>

Done.

I have also added a declaration for this function in nodes.h after
definition of struct Bitmapset. WIthout this declaration compiler
gives warning "no previous declaration" of this function.

Tested it under the debugger

Breakpoint 1, make_join_rel (root=0x20cafb0, rel1=0x20e2998,
rel2=0x20dd2c0) at joinrels.c:664
(gdb) p bmsToString(rel1->relids)
$1 = 0x2102fd0 "(b 1 3)"
(gdb) p bmsToString(rel2->relids)
$2 = 0x2104bc0 "(b 4)"
(gdb) p bmsToString(joinrelids)
$3 = 0x2104fd8 "(b 1 3 4)"
(gdb) p bmsToString(joinrel->relids)
$4 = 0x2105998 "(b 1 3 4)"
(gdb) p bmsToString(joinrel->lateral_relids)
$5 = 0x2105db0 "(b)"
(gdb) p joinrel->lateral_relids
$6 = (Relids) 0x0

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 29b7712..e1bbcf7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3886,10 +3886,25 @@ outNode(StringInfo str, const void *obj)
 char *
 nodeToString(const void *obj)
 {
 	StringInfoData str;
 
 	/* see stringinfo.h for an explanation of this maneuver */
 	initStringInfo(&str);
 	outNode(&str, obj);
 	return str.data;
 }
+
+/*
+ * bmsToNode -
+ * 		returns the ascii representation of the Bitmapset as a palloc'd string
+ */
+char *
+bmsToString(const Bitmapset *bms)
+{
+	StringInfoData str;
+
+	/* see stringinfo.h for an explanation of this maneuver */
+	initStringInfo(&str);
+	outBitmapset(&str, bms);
+	return str.data;
+}
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 2f7efa8..b239b99 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -554,20 +554,21 @@ extern PGDLLIMPORT Node *newNodeMacroHolder;
 extern char *nodeToString(const void *obj);
 
 struct Bitmapset;/* not to include bitmapset.h here */
 struct StringInfoData;			/* not to include stringinfo.h here */
 extern void outNode(struct StringInfoData *str, const void *obj);
 extern void outToken(struct StringInfoData *str, const char *s);
 extern void outBitmapset(struct StringInfoData *str,
 			 const struct Bitmapset *bms);
 extern void outDatum(struct StringInfoData *str, uintptr_t value,
 		 int typlen, bool typbyval);
+extern char *bmsToString(const struct Bitmapset *bms);
 
 /*
  * nodes/{readfuncs.c,read.c}
  */
 extern void *stringToNode(char *str);
 extern struct Bitmapset *readBitmapset(void);
 extern uintptr_t readDatum(bool typbyval);
 extern bool *readBoolCols(int numCols);
 extern int *readIntCols(int numCols);
 extern Oid *readOidCols(int numCols);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why postgres take RowExclusiveLock on all partition

2016-09-16 Thread Ashutosh Bapat

On Fri, Sep 16, 2016 at 4:31 PM, Sachin Kotwal  wrote:
> Hi Hackers,
>
>
> I checked if there  is update transaction on master table involved in
> partition.
> Postgresql takes  RowExclusiveLock on all partition tables.
>
> constraint exclusion is set to on.

I checked this under the debugger and found that only the partitions
which are scanned. The partitions excluded by constraints are not
locked.

postgres=# create table t1 (a int);
CREATE TABLE
postgres=# set constraint_exclusion to partition;
SET
postgres=# create table t1_p1() inherits (t1);
CREATE TABLE
postgres=# alter table t1_p1 add constraint a_part check (a > 0 and a < 100);
ALTER TABLE
postgres=# create table t1_p2() inherits (t1);
CREATE TABLE
postgres=# alter table t1_p2 add constraint a_part check (a > 100 and a < 200);
ALTER TABLE
postgres=# insert into t1_p1 select i from generate_series(1, 5) i;
INSERT 0 5
postgres=# insert into t1_p2 select i from generate_series(101, 105) i;
INSERT 0 5
postgres=# explain verbose select * from t1 where a > 100;
 QUERY PLAN
-
 Append  (cost=0.00..41.88 rows=851 width=4)
   ->  Seq Scan on public.t1  (cost=0.00..0.00 rows=1 width=4)
 Output: t1.a
 Filter: (t1.a > 100)
   ->  Seq Scan on public.t1_p2  (cost=0.00..41.88 rows=850 width=4)
 Output: t1_p2.a
 Filter: (t1_p2.a > 100)
(7 rows)

postgres=# explain verbose update t1 set a = a where a > 100;
  QUERY PLAN
--
 Update on public.t1  (cost=0.00..41.88 rows=851 width=10)
   Update on public.t1
   Update on public.t1_p2
   ->  Seq Scan on public.t1  (cost=0.00..0.00 rows=1 width=10)
 Output: t1.a, t1.ctid
 Filter: (t1.a > 100)
   ->  Seq Scan on public.t1_p2  (cost=0.00..41.88 rows=850 width=10)
 Output: t1_p2.a, t1_p2.ctid
 Filter: (t1_p2.a > 100)
(9 rows)

The RowExclusiveLock is taken in InitPlan(), which is called after the
partitions have been excluded.

 817│ foreach(l, resultRelations)
 818│ {
 819│ Index   resultRelationIndex =
lfirst_int(l);
 820│ Oid resultRelationOid;
 821│ RelationresultRelation;
 822│
 823│ resultRelationOid =
getrelid(resultRelationIndex, rangeTable);
 824├>resultRelation =
heap_open(resultRelationOid, RowExclusiveLock);
 825│ InitResultRelInfo(resultRelInfo,
 826│   resultRelation,
 827│
resultRelationIndex,
 828│
estate->es_instrument);
 829│ resultRelInfo++;
 830│ }

It does lock the parent table, since inheritance allows to have rows
in that table. If the constraints on that table are not enough to
exclude it by conditions, it will be scanned.

Am I missing something? It might help to have SQL commands you are
running. Also, can you please explain why do you think all the
partitions are locked in RowExclusiveLock mode.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvements in psql hooks for variables

2016-09-18 Thread Ashutosh Bapat

You may want to add this to the November commitfest
https://commitfest.postgresql.org/11/.

On Fri, Sep 16, 2016 at 6:05 PM, Daniel Verite  wrote:
>  Hi,
>
> Following the discussion on forbidding an AUTOCOMMIT off->on
> switch mid-transaction [1], attached is a patch that let the hooks
> return a boolean indicating whether a change is allowed.
>
> Using the hooks, bogus assignments to built-in variables can
> be dealt with more strictly.
>
> For example, pre-patch behavior:
>
>   =# \set ECHO errors
>   =# \set ECHO on
>   unrecognized value "on" for "ECHO"; assuming "none"
>   =# \echo :ECHO
>   on
>
> which has two problems:
> - we have to assume a value, even though we can't know what the user meant.
> - after assignment, the user-visible value of the variable diverges from its
> internal counterpart (pset.echo in this case).
>
>
> Post-patch:
>   =# \set ECHO errors
>   =# \set ECHO on
>   unrecognized value "on" for "ECHO"
>   \set: error while setting variable
>   =# \echo :ECHO
>   errors
>
> Both the internal pset.* state and the user-visible value are kept unchanged
> is the input value is incorrect.
>
> Concerning AUTOCOMMIT, autocommit_hook() could return false to forbid
> a switch when the conditions are not met.
>
>
> Another user-visible effect of the patch is that, using a bogus value
> for a built-in variable on the command-line becomes a fatal error
> that prevents psql to continue.
> This is not directly intended by the patch but is a consequence
> of SetVariable() failing.
>
> Example:
>   $ ./psql -vECHO=bogus
>   unrecognized value "bogus" for "ECHO"
>   psql: could not set variable "ECHO"
>   $ echo $?
>   1
>
> The built-in vars concerned by the change are:
>
> booleans: AUTOCOMMIT, ON_ERROR_STOP, QUIET, SINGLELINE, SINGLESTEP
>
> non-booleans: ECHO, ECHO_HIDDEN, ON_ERROR_ROLLBACK, COMP_KEYWORD_CASE,
>  HISTCONTROL, VERBOSITY, SHOW_CONTEXT
>
> We could go further to close the gap between pset.* and the built-in
> variables,
> by changing how they're initialized and forbidding deletion as Tom
> suggests in [2], but if there's negative feedback on the above changes,
> I think we should hear it first.
>
> [1]
> https://www.postgresql.org/message-id/f2cb5838-0ee9-4fe3-acc0-df77aeb7d4c7%40mm
> [2]
> https://www.postgresql.org/message-id/4695.1473961140%40sss.pgh.pa.us
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2016-09-18 Thread Ashutosh Bapat

On Fri, Sep 16, 2016 at 6:00 PM, Rajkumar Raghuwanshi
 wrote:
>
> On Fri, Sep 9, 2016 at 3:17 PM, Ashutosh Bapat
>  wrote:
>>
>> Hi All,
>>
>> PFA the patch to support partition-wise joins for partitioned tables. The
>> patch
>> is based on the declarative parition support patches provided by Amit
>> Langote
>> on 26th August 2016.
>
>
> I have applied declarative partitioning patches posted by Amit Langote on 26
> Aug 2016 and then partition-wise-join patch,  getting below error while make
> install.
>
> ../../../../src/include/nodes/relation.h:706: error: redefinition of typedef
> ‘PartitionOptInfo’
> ../../../../src/include/nodes/relation.h:490: note: previous declaration of
> ‘PartitionOptInfo’ was here
> make[4]: *** [gistbuild.o] Error 1
> make[4]: Leaving directory
> `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src/backend/access/gist'
> make[3]: *** [gist-recursive] Error 2
> make[3]: Leaving directory
> `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src/backend/access'
> make[2]: *** [access-recursive] Error 2
> make[2]: Leaving directory
> `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src/backend'
> make[1]: *** [all-backend-recurse] Error 2
> make[1]: Leaving directory
> `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src'
> make: *** [all-src-recurse] Error 2
>
> PS : I am using - gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
>
> Attached the patch for the fix of above error.

Thanks for the report. I will fix this in the next patch.



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-18 Thread Ashutosh Bapat

Thanks a lot.

On Fri, Sep 16, 2016 at 7:07 PM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>>> I'd suggest that this is parallel to nodeToString() and therefore
>>> (a) should be placed beside it,
>
>> Done. Added it after nodeToString().
>
> Pushed, thanks.
>
> regards, tom lane



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Calculation of param_source_rels in add_paths_to_joinrel

2016-09-19 Thread Ashutosh Bapat

Hi,
There's code in add_paths_to_joinrel() which computes the set of
target relations that should overlap parameterization of any proposed
join path.

 120 /*
 121  * Decide whether it's sensible to generate parameterized
paths for this
 122  * joinrel, and if so, which relations such paths should
require.  There
 123  * is usually no need to create a parameterized result path
unless there
 124  * is a join order restriction that prevents joining one of
our input rels
 125  * directly to the parameter source rel instead of joining to the other
 126  * input rel.  (But see allow_star_schema_join().)  This restriction
 127  * reduces the number of parameterized paths we have to deal with at
 128  * higher join levels, without compromising the quality of
the resulting
 129  * plan.  We express the restriction as a Relids set that
must overlap the
 130  * parameterization of any proposed join path.
 131  */

The calculations that follow are based on joinrel->relids (baserels
covered by the join) and SpecialJoinInfo list in PlannerInfo. It is
not based on specific combination of relations being joined or the
paths being generated. We should probably do this computation once and
store the result in the joinrel and use it multiple times. That way we
can avoid computing the same set again and again for every pair of
joining relations and their order. Any reasons why we don't do this?

Attached patch moves this code to build_join_rel() and uses it in
add_paths_to_joinrel(). make check-world does not show any failures.

If this change is acceptable, we might actually remove
param_source_rels from JoinPathExtraData and directly access it from
joinrel in try_nestloop_path(), try_mergejoin_path() and
try_hashjoin_path().

Also, the way this code has been written, the declaration of variable
sjinfo masks the earlier declaration with the same name. I am not sure
if that's intentional, but may be we should use another variable name
for the inner sjinfo. I have not included that change in the patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_param_source_rels.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] Error running custom plugin: “output plugins have to declare the _PG_output_plugin_init symbol”

2016-09-20 Thread Ashutosh Bapat

On Tue, Sep 20, 2016 at 4:32 PM, valeriof  wrote:
> Hi Ashutosh,
> Thank you for your answer. At the end I realized that the PGDLLEXPORT
> keyword was missing from the functions definitions.
>
> As a side question, what are the options to debug the plugin while it's
> being executing? I've seen a debug plugin for Postgres but it seems more for
> SQL functions and stored procedures. Is it possible to attach the process
> from Visual Studio debugger?
>

I have never used Visual Studio, but you might find something useful
at 
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2016-09-20 Thread Ashutosh Bapat

Hi Rajkumar,


On Fri, Sep 16, 2016 at 6:00 PM, Rajkumar Raghuwanshi
 wrote:
>
> On Fri, Sep 9, 2016 at 3:17 PM, Ashutosh Bapat
>  wrote:
>>
>> Hi All,
>>
>> PFA the patch to support partition-wise joins for partitioned tables. The
>> patch
>> is based on the declarative parition support patches provided by Amit
>> Langote
>> on 26th August 2016.
>
>
> I have applied declarative partitioning patches posted by Amit Langote on 26
> Aug 2016 and then partition-wise-join patch,  getting below error while make
> install.
>
> ../../../../src/include/nodes/relation.h:706: error: redefinition of typedef
> ‘PartitionOptInfo’
> ../../../../src/include/nodes/relation.h:490: note: previous declaration of
> ‘PartitionOptInfo’ was here
> make[4]: *** [gistbuild.o] Error 1
> make[4]: Leaving directory
> `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src/backend/access/gist'
> make[3]: *** [gist-recursive] Error 2
> make[3]: Leaving directory
> `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src/backend/access'
> make[2]: *** [access-recursive] Error 2
> make[2]: Leaving directory
> `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src/backend'
> make[1]: *** [all-backend-recurse] Error 2
> make[1]: Leaving directory
> `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src'
> make: *** [all-src-recurse] Error 2
>
> PS : I am using - gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
>

Thanks for the report and the patch.

This is fixed by the patch posted with
https://www.postgresql.org/message-id/CAFjFpRdRFWMc4zNjeJB6p1Ncpznc9DMdXfZJmVK5X_us5zeD9Q%40mail.gmail.com.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-09-21 Thread Ashutosh Bapat

ical is ready to handle).  This approach will be very
> limiting as then range partitions will be limited to columns of int,
> bigint and date type only.
>
> One more option is we let the user specify the canonicalize function next
> to the column name when defining the partition key.  If not specified, we
> hard-code one for the types for which we will be implementing a
> canonicalize function (ie, above mentioned types).  In other cases, we
> just don't have one and hence if an unexpected result occurs when creating
> a new partition, it's up to the user to realize what happened.  Of course,
> we will be mentioning in the documentation why a canonicalize function is
> necessary and how to write one.  Note that this canonicalize function
> comes into play only when defining new partitions, it has no role beyond
> that point.
>
>> -- Observation 2 : able to create sub-partition out of the range set for
>> main table, causing not able to insert data satisfying any of the partition.
>>
>> create table test_subpart (c1 int) partition by range (c1);
>> create table test_subpart_p1 partition of test_subpart for values start (1)
>> end (100) inclusive partition by range (c1);
>> create table test_subpart_p1_sub1 partition of test_subpart_p1 for values
>> start (101) end (200);
>>
>> \d+ test_subpart
>>  Table "public.test_subpart"
>>  Column |  Type   | Modifiers | Storage | Stats target | Description
>> +-+---+-+--+-
>>  c1 | integer |   | plain   |  |
>> Partition Key: RANGE (c1)
>> Partitions: test_subpart_p1 FOR VALUES START (1) END (100) INCLUSIVE
>>
>> \d+ test_subpart_p1
>>Table "public.test_subpart_p1"
>>  Column |  Type   | Modifiers | Storage | Stats target | Description
>> +-+---+-+--+-
>>  c1 | integer |   | plain   |  |
>> Partition Of: test_subpart FOR VALUES START (1) END (100) INCLUSIVE
>> Partition Key: RANGE (c1)
>> Partitions: test_subpart_p1_sub1 FOR VALUES START (101) END (200)
>>
>> insert into test_subpart values (50);
>> ERROR:  no partition of relation "test_subpart_p1" found for row
>> DETAIL:  Failing row contains (50).
>> insert into test_subpart values (150);
>> ERROR:  no partition of relation "test_subpart" found for row
>> DETAIL:  Failing row contains (150).
>
> It seems that DDL should prevent the same column being used in partition
> key of lower level partitions.  I don't know how much sense it would make,
> but being able to use the same column as partition key of lower level
> partitions may be a feature useful to some users if they know what they
> are doing.  But this last part doesn't sound like a good thing.  I
> modified the patch such that lower level partitions cannot use columns
> used by ancestor tables.
>
>> -- Observation 3 : Getting cache lookup failed, when selecting list
>> partition table containing array.
>>
>> CREATE TABLE test_array ( i int,j int[],k text[]) PARTITION BY LIST (j);
>> CREATE TABLE test_array_p1 PARTITION OF test_array FOR VALUES IN ('{1}');
>> CREATE TABLE test_array_p2 PARTITION OF test_array FOR VALUES IN ('{2,2}');
>>
>> INSERT INTO test_array (i,j[1],k[1]) VALUES (1,1,1);
>> INSERT INTO test_array (i,j[1],j[2],k[1]) VALUES (2,2,2,2);
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array_p1;
>>tableoid| i |  j  |  k
>> ---+---+-+-
>>  test_array_p1 | 1 | {1} | {1}
>> (1 row)
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array_p2;
>>tableoid| i |   j   |  k
>> ---+---+---+-
>>  test_array_p2 | 2 | {2,2} | {2}
>> (1 row)
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array;
>> ERROR:  cache lookup failed for type 0
>
> That's a bug.  Fixed in the attached patch.
>
> PS: I'm going to have limited Internet access during this weekend and over
> the next week, so responses could be slow.  Sorry about that.
>
> Thanks,
> Amit



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-09-22 Thread Ashutosh Bapat

For list partitions, the ListInfo stores the index maps for values
i.e. the index of the partition to which the value belongs. Those
indexes are same as the indexes in partition OIDs array and come from
the catalogs. In case a user creates two partitioned tables with
exactly same lists for partitions but specifies them in a different
order, the OIDs are stored in the order specified. This means that
index array for these tables come out different. equal_list_info()
works around that by creating an array of mappings and checks whether
that mapping is consistent for all values. This means we will create
the mapping as many times as equal_list_info() is called, which is
expected to be more than the number of time
RelationBuildPartitionDescriptor() is called. Instead, if we
"canonicalise" the indexes so that they come out exactly same for
similarly partitioned tables, we build the mapping only once and
arrange OIDs accordingly.

Here's patch to do that. I have ran make check with this and it didn't
show any failure. Please consider this to be included in your next set
of patches.

That helps partition-wise join as well. For partition-wise join (and
further optimizations for partitioned tables), we create a list of
canonical partition schemes. In this list two similarly partitioned
tables share partition scheme pointer. A join between relations with
same partition scheme pointer can be joined partition-wise. It's
important that the indexes in partition scheme match to the OIDs array
to find matching RelOptInfos for partition-wise join.

On Thu, Sep 22, 2016 at 11:12 AM, Ashutosh Bapat
 wrote:
> Hi Amit,
> Following sequence of DDLs gets an error
> --
> -- multi-leveled partitions
> --
> CREATE TABLE prt1_l (a int, b int, c varchar) PARTITION BY RANGE(a);
> CREATE TABLE prt1_l_p1 PARTITION OF prt1_l FOR VALUES START (0) END
> (250) PARTITION BY RANGE (b);
> CREATE TABLE prt1_l_p1_p1 PARTITION OF prt1_l_p1 FOR VALUES START (0) END 
> (100);
> CREATE TABLE prt1_l_p1_p2 PARTITION OF prt1_l_p1 FOR VALUES START
> (100) END (250);
> CREATE TABLE prt1_l_p2 PARTITION OF prt1_l FOR VALUES START (250) END
> (500) PARTITION BY RANGE (c);
> CREATE TABLE prt1_l_p2_p1 PARTITION OF prt1_l_p2 FOR VALUES START
> ('0250') END ('0400');
> CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES START
> ('0400') END ('0500');
> CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES START (500) END
> (600) PARTITION BY RANGE ((b + a));
> ERROR:  cannot use column or expression from ancestor partition key
>
> The last statement is trying create subpartitions by range (b + a),
> which contains a partition key from ancestor partition key but is not
> exactly same as that. In fact it contains some extra columns other
> than the ancestor partition key columns. Why do we want to prohibit
> such cases?
>
> On Thu, Sep 15, 2016 at 2:23 PM, Amit Langote
>  wrote:
>>
>> Hi
>>
>> On 2016/09/09 17:55, Amit Langote wrote:
>>> On 2016/09/06 22:04, Amit Langote wrote:
>>>> Will fix.
>>>
>>> Here is an updated set of patches.
>>
>> An email from Rajkumar somehow managed to break out of this thread.
>> Quoting his message below so that I don't end up replying with patches on
>> two different threads.
>>
>> On 2016/09/14 16:58, Rajkumar Raghuwanshi wrote:
>>> I have Continued with testing declarative partitioning with the latest
>>> patch. Got some more observation, given below
>>
>> Thanks a lot for testing.
>>
>>> -- Observation 1 : Getting overlap error with START with EXCLUSIVE in range
>>> partition.
>>>
>>> create table test_range_bound ( a int) partition by range(a);
>>> --creating a partition to contain records {1,2,3,4}, by default 1 is
>>> inclusive and 5 is exclusive
>>> create table test_range_bound_p1 partition of test_range_bound for values
>>> start (1) end (5);
>>> --now trying to create a partition by explicitly mentioning start is
>>> exclusive to contain records {5,6,7}, here trying to create with START with
>>> 4 as exclusive so range should be 5 to 8, but getting partition overlap
>>> error.
>>> create table test_range_bound_p2 partition of test_range_bound for values
>>> start (4) EXCLUSIVE end (8);
>>> ERROR:  partition "test_range_bound_p2" would overlap partition
>>> "test_range_bound_p1"
>>
>> Wow, this is bad.  What is needed in this case is "canonicalization" of
>> the range partition bounds specified in the command.  Range types do this
>> and hence an equivalent test done with range type values would disagree
>> with the res

Re: [HACKERS] Declarative partitioning - another take

2016-09-22 Thread Ashutosh Bapat

On Thu, Sep 22, 2016 at 1:02 PM, Ashutosh Bapat
 wrote:
> For list partitions, the ListInfo stores the index maps for values
> i.e. the index of the partition to which the value belongs. Those
> indexes are same as the indexes in partition OIDs array and come from
> the catalogs. In case a user creates two partitioned tables with
> exactly same lists for partitions but specifies them in a different
> order, the OIDs are stored in the order specified. This means that
> index array for these tables come out different. equal_list_info()
> works around that by creating an array of mappings and checks whether
> that mapping is consistent for all values. This means we will create
> the mapping as many times as equal_list_info() is called, which is
> expected to be more than the number of time
> RelationBuildPartitionDescriptor() is called. Instead, if we
> "canonicalise" the indexes so that they come out exactly same for
> similarly partitioned tables, we build the mapping only once and
> arrange OIDs accordingly.
>
> Here's patch to do that. I have ran make check with this and it didn't
> show any failure. Please consider this to be included in your next set
> of patches.

The patch has an if condition as statement by itself
+if (l1->null_index != l2->null_index);
 return false;

There shouldn't be ';' at the end. It looks like in the tests you have
added the function always bails out before it reaches this statement.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

2016-09-26 Thread Ashutosh Bapat

1 where c1
= 13) q(a) right join ft2 on (q.a = ft2.c1) where ft2.c1 between 10
and 15 group by q.a order by 1 nulls last;

26. Instead of the testcase below a test which has window aggregates over a
pushed down aggregate makes sense in the context of aggregate pushdown.
+-- WindowAgg
+explain (verbose, costs off)
+select c2, count(c2) over (partition by c2%2) from ft2 where c2 < 10
order by 1 limit 5 offset 95;

postgres_fdw.out has grown by 2000 lines because of testcases in this
patch. We need to compact the testcases so that easier to maintain in
the future.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-09-26 Thread Ashutosh Bapat

On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
 wrote:
> On 2016/09/15 15:29, Ashutosh Bapat wrote:
>>
>> On Wed, Sep 14, 2016 at 8:52 PM, Robert Haas 
>> wrote:
>
>
>>> I'm not sure why it wouldn't work
>>> to just use the lowest RTI involved in the join, though; the others
>>> won't appear anywhere else at that query level.
>
>
>> So +1 for
>> using the smallest RTI to indicate a subquery.
>
>
> +1 for the general idea.
>
> ISTM that the use of the same RTI for subqueries in multi-levels in a remote
> SQL makes the SQL a bit difficult to read.  How about using the position of
> the join rel in join_rel_list, (more precisely, the position plus
> list_length(root->parse->rtable)), instead?
>
We switch to hash table to maintain the join RelOptInfos when the
number of joins grows larger, where the position won't make much
sense. We might differentiate between a base relation alias and
subquery alias by using different prefixes like "r" and "s"  resp.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-09-26 Thread Ashutosh Bapat

My original patch added code to manage the files for 2 phase
transactions opened by the local server on the remote servers. This
code was mostly inspired from the code in twophase.c which manages the
file for prepared transactions. The logic to manage 2PC files has
changed since [1] and has been optimized. One of the things I wanted
to do is see, if those optimizations are applicable here as well. Have
you considered that?


[1]. 
https://www.postgresql.org/message-id/74355FCF-AADC-4E51-850B-47AF59E0B215%40postgrespro.ru

On Fri, Aug 26, 2016 at 11:43 AM, Ashutosh Bapat
 wrote:
>
>
> On Fri, Aug 26, 2016 at 11:37 AM, Masahiko Sawada 
> wrote:
>>
>> On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat
>>  wrote:
>> >
>> >
>> > On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada
>> > 
>> > wrote:
>> >>
>> >> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale 
>> >> wrote:
>> >> > Hi All,
>> >> >
>> >> > Ashutosh proposed the feature 2PC for FDW for achieving atomic
>> >> > commits
>> >> > across multiple foreign servers.
>> >> > If a transaction make changes to more than two foreign servers the
>> >> > current
>> >> > implementation in postgres_fdw doesn't make sure that either all of
>> >> > them
>> >> > commit or all of them rollback their changes.
>> >> >
>> >> > We (Masahiko Sawada and me) reopen this thread and trying to
>> >> > contribute
>> >> > in
>> >> > it.
>> >> >
>> >> > 2PC for FDW
>> >> > 
>> >> > The patch provides support for atomic commit for transactions
>> >> > involving
>> >> > foreign servers. when the transaction makes changes to foreign
>> >> > servers,
>> >> > either all the changes to all the foreign servers commit or rollback.
>> >> >
>> >> > The new patch 2PC for FDW include the following things:
>> >> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
>> >> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can
>> >> > involve
>> >> > in
>> >> > the transaction.
>> >> >
>> >> > Currently we can push some conditions down to shard nodes, especially
>> >> > in
>> >> > 9.6
>> >> > the directly modify feature has
>> >> > been introduced. But such a transaction modifying data on shard node
>> >> > is
>> >> > not
>> >> > executed surely.
>> >> > Using 0002 patch, that modify is executed with 2PC. It means that we
>> >> > almost
>> >> > can provide sharding solution using
>> >> > multiple PostgreSQL server (one parent node and several shared node).
>> >> >
>> >> > For multi master, we definitely need transaction manager but
>> >> > transaction
>> >> > manager probably can use this 2PC for FDW feature to manage
>> >> > distributed
>> >> > transaction.
>> >> >
>> >> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
>> >> >
>> >> > 0002 patch makes postgres_fdw to use below APIs. These APIs are
>> >> > generic
>> >> > features which can be used by all kinds of FDWs.
>> >> >
>> >> > a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead
>> >> > of
>> >> > COMMIT/ABORT on foreign server which supports 2PC.
>> >> > b. Manage information of foreign prepared transactions resolver
>> >> >
>> >> > Masahiko Sawada will post the patch.
>> >> >
>> >> >
>> >>
>> >
>> > Thanks Vinayak and Sawada-san for taking this forward and basing your
>> > work
>> > on my patch.
>> >
>> >>
>> >> Still lot of work to do but attached latest patches.
>> >> These are based on the patch Ashutosh posted before, I revised it and
>> >> divided into two patches.
>> >> Compare with original patch, patch of pg_fdw_xact_resolver and
>> >> documentation are lacked.
>> >
>> >
>> > I am not able to understand the last statement.
>>
>> Sorry to confuse you.
>>
>> > Do you mean to say that your patches do not have pg_fdw_xact_resolver()
>> > and
>> > documentation that my patches had?
>>
>> Yes.
>> I'm confirming them that your patches had.
>
>
> Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve
> any transactions which can not be resolved immediately after they were
> prepared. There was a comment from Kevin (IIRC) that leaving transactions
> unresolved on the foreign server keeps the resources locked on those
> servers. That's not a very good situation. And nobody but the initiating
> server can resolve those. That functionality is important to make it a
> complete 2PC solution. So, please consider it to be included in your first
> set of patches.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-09-26 Thread Ashutosh Bapat

On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita
 wrote:
> On 2016/09/26 18:06, Ashutosh Bapat wrote:
>>
>> On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
>>  wrote:
>
>
>>> ISTM that the use of the same RTI for subqueries in multi-levels in a
>>> remote
>>> SQL makes the SQL a bit difficult to read.  How about using the position
>>> of
>>> the join rel in join_rel_list, (more precisely, the position plus
>>> list_length(root->parse->rtable)), instead?
>
>
>> We switch to hash table to maintain the join RelOptInfos when the
>> number of joins grows larger, where the position won't make much
>> sense.
>
>
> That's right, but we still store the joinrel into join_rel_list after
> creating that hash table.  That hash table is just for fast lookup.  See
> build_join_rel.

Sorry, I wasn't clear in my reply. As the list grows, it will take
longer to locate the RelOptInfo of the given join. Doing that for
creating an alias seems an overkill.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-09-26 Thread Ashutosh Bapat

On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada  wrote:
> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>  wrote:
>> My original patch added code to manage the files for 2 phase
>> transactions opened by the local server on the remote servers. This
>> code was mostly inspired from the code in twophase.c which manages the
>> file for prepared transactions. The logic to manage 2PC files has
>> changed since [1] and has been optimized. One of the things I wanted
>> to do is see, if those optimizations are applicable here as well. Have
>> you considered that?
>>
>>
>
> Yeah, we're considering it.
> After these changes are committed, we will post the patch incorporated
> these changes.
>
> But what we need to do first is the discussion in order to get consensus.
> Since current design of this patch is to transparently execute DCL of
> 2PC on foreign server, this code changes lot of code and is
> complicated.

Can you please elaborate. I am not able to understand what DCL is
involved here. According to [1], examples of DCL are GRANT and REVOKE
command.

> Another approach I have is to push down DCL to only foreign servers
> that support 2PC protocol, which is similar to DML push down.
> This approach would be more simpler than current idea and is easy to
> use by distributed transaction manager.

Again, can you please elaborate, how that would be different from the
current approach and how does it simplify the code.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

2016-09-26 Thread Ashutosh Bapat

This patch will need some changes to conversion_error_callback(). That
function reports an error in case there was an error converting the
result obtained from the foreign server into an internal datum e.g.
when the string returned by the foreign server is not acceptable by
local input function for the expected datatype. In such cases, the
input function will throw error and conversion_error_callback() will
provide appropriate context for that error. postgres_fdw.sql has tests
to test proper context

-- ===
-- conversion error
-- ===
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int;
SELECT * FROM ft1 WHERE c1 = 1;  -- ERROR
SELECT  ft1.c1,  ft2.c2, ft1.c8 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1
AND ft1.c1 = 1; -- ERROR
SELECT  ft1.c1,  ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND
ft1.c1 = 1; -- ERROR
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE user_enum;

Right now we report the column name in the error context. This needs
to change for aggregate pushdown, which is pushing down expressions.
Right now, in case the aggregate on foreign server produces a string
unacceptable locally, it would crash at
4982 Assert(IsA(var, Var));
4983
4984 rte = rt_fetch(var->varno, estate->es_range_table);

since it's not a Var node as expected.

We need to fix the error context to provide meaningful information or
at least not crash. This has been discussed briefly in [1].

[1] 
https://www.postgresql.org/message-id/flat/CAFjFpRdcC7Ykb1SkARBYikx9ubKniBiAgHqMD9e47TxzY2EYFw%40mail.gmail.com#cafjfprdcc7ykb1skarbyikx9ubknibiaghqmd9e47txzy2e...@mail.gmail.com

On Fri, Sep 16, 2016 at 7:15 PM, Jeevan Chalke
 wrote:
> Hi,
>
> On Mon, Sep 12, 2016 at 5:17 PM, Jeevan Chalke
>  wrote:
>>
>> Thanks Ashutosh for the detailed review comments.
>>
>> I am working on it and will post updated patch once I fix all your
>> concerns.
>>
>>
>
> Attached new patch fixing the review comments.
>
> Here are few comments on the review points:
>
> 1. Renamed deparseFromClause() to deparseFromExpr() and
> deparseAggOrderBy() to appendAggOrderBy()
>
> 2. Done
>
> 3. classifyConditions() assumes list expressions of type RestrictInfo. But
> HAVING clause expressions (and JOIN conditions) are plain expressions. Do
> you mean we should modify the classifyConditions()? If yes, then I think it
> should be done as a separate (cleanup) patch.
>
> 4, 5. Both done.
>
> 6. Per my understanding, I think checking for just aggregate function is
> enough as we are interested in whole aggregate result. Meanwhile I will
> check whether we need to find and check shippability of transition,
> combination and finalization functions or not.
>
> 7, 8, 9, 10, 11, 12. All done.
>
> 13. Fixed. However instead of adding new function made those changes inline.
> Adding support for deparsing List expressions separating list by comma can
> be
> considered as cleanup patch as it will touch other code area not specific to
> aggregate push down.
>
> 14, 15, 16, 17. All done.
>
> 18. I think re-factoring estimate_path_cost_size() should be done separately
> as a cleanup patch too.
>
> 19, 20, 21, 22, 23, 24, 25, 26, 27, 28. All done.
>
> 29. I have tried passing input rel's relids to fetch_upper_rel() call in
> create_grouping_paths(). It solved this patch's issue, but ended up with
> few regression failures which is mostly plan changes. I am not sure whether
> we get good plan now or not as I have not analyzed them.
> So for this patch, I am setting relids in add_foreign_grouping_path()
> itself.
> However as suggested, used bms_copy(). I still kept the FIXME over there as
> I think it should have done by the core itself.
>
> 30, 31, 32, 33. All done.
>
> Let me know your views.
>
> Thanks
> --
> Jeevan B Chalke
> Principal Software Engineer, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-09-26 Thread Ashutosh Bapat

On Tue, Sep 27, 2016 at 8:48 AM, Etsuro Fujita
 wrote:
> On 2016/09/26 20:20, Ashutosh Bapat wrote:
>>
>> On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita
>>  wrote:
>>>
>>> On 2016/09/26 18:06, Ashutosh Bapat wrote:
>>>>
>>>> On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
>>>>  wrote:
>
>
>>>>> ISTM that the use of the same RTI for subqueries in multi-levels in a
>>>>> remote
>>>>> SQL makes the SQL a bit difficult to read.  How about using the
>>>>> position
>>>>> of
>>>>> the join rel in join_rel_list, (more precisely, the position plus
>>>>> list_length(root->parse->rtable)), instead?
>
>
>>>> We switch to hash table to maintain the join RelOptInfos when the
>>>> number of joins grows larger, where the position won't make much
>>>> sense.
>
>
>>> That's right, but we still store the joinrel into join_rel_list after
>>> creating that hash table.
>
>
>> As the list grows, it will take
>> longer to locate the RelOptInfo of the given join. Doing that for
>> creating an alias seems an overkill.
>
>
> The join rel is appended to the end of the list, so I was thinking to get
> the position info by list_length during postgresGetForeignJoinPaths.

That's true only when the paths are being added to a newly created
joinrel. But that's not true always. We may add paths with different
joining order to an existing joinrel, in which case list_length would
not give its position. Am I missing something?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-09-26 Thread Ashutosh Bapat

>
> With this patch, the mapping is created *only once* during
> RelationBuildPartitionDesc() to assign canonical indexes to individual
> list values.  The partition OID array will also be rearranged such that
> using the new (canonical) index instead of the old
> catalog-scan-order-based index will retrieve the correct partition for
> that value.
>
> By the way, I fixed one thinko in your patch as follows:
>
> -result->oids[i] = oids[mapping[i]];
> +result->oids[mapping[i]] = oids[i];

While I can not spot any problem with this logic, when I make that
change and run partition_join testcase in my patch, it fails because
wrong partitions are matched for partition-wise join of list
partitions. In that patch, RelOptInfo of partitions are saved in
RelOptInfo of the parent by matching their OIDs. They are saved in the
same order as corresponding OIDs. Partition-wise join simply joins the
RelOptInfos at the same positions from both the parent RelOptInfos. I
can not spot an error in this logic too.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Ashutosh Bapat

>>
>> Please check if you are able to reproduce these errors in your
>> repository. I made sure that I cleaned up all partition-wise join code
>> before testing this, but ... .
>
> Thanks for the test case.  I can reproduce the same.
>
>> I tried to debug the problem somewhat. In set_append_rel_pathlist(),
>> it finds that at least one child has a parameterized path as the
>> cheapest path, so it doesn't create an unparameterized path for append
>> rel. At the same time there is a parameterization common to all the
>> children, so it doesn't create any path. There seem to be two problems
>> here
>> 1. The children from second level onwards may not be getting
>> parameterized for lateral references. That seems unlikely but
>> possible.

Did you check this? We may be missing on creating index scan paths
with parameterization. If we fix this, we don't need to
re-parameterize Append.

>> 2. Reparameterization should have corrected this, but
>> reparameterize_path() does not support AppendPaths.
>
> Hmm, 0005-Refactor-optimizer-s-inheritance-set-expansion-code-5.patch is
> certainly to be blamed here; if I revert the patch, the problem goes away.
>
> Based on 2 above, I attempted to add logic for AppendPath in
> reparameterize_path() as in the attached.  It fixes the reported problem
> and does not break any regression tests. If it's sane to do things this
> way, I will incorporate the attached into patch 0005 mentioned above.
> Thoughts?


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Ashutosh Bapat

On Tue, Sep 27, 2016 at 2:46 PM, Amit Langote
 wrote:
> On 2016/09/27 15:44, Ashutosh Bapat wrote:
>>> By the way, I fixed one thinko in your patch as follows:
>>>
>>> -result->oids[i] = oids[mapping[i]];
>>> +result->oids[mapping[i]] = oids[i];
>>
>> While I can not spot any problem with this logic, when I make that
>> change and run partition_join testcase in my patch, it fails because
>> wrong partitions are matched for partition-wise join of list
>> partitions. In that patch, RelOptInfo of partitions are saved in
>> RelOptInfo of the parent by matching their OIDs. They are saved in the
>> same order as corresponding OIDs. Partition-wise join simply joins the
>> RelOptInfos at the same positions from both the parent RelOptInfos. I
>> can not spot an error in this logic too.
>
> OTOH, using the original logic makes tuple routing put tuples into the
> wrong partitions.  When debugging why that was happening I discovered this
> and hence the proposed change.
>
> You mean that partition RelOptInfo's are placed using the canonical
> ordering of OIDs instead of catalog-scan-driven order, right?  If that's
> true, then there is no possibility of wrong pairing happening, even with
> the new ordering of OIDs in the partition descriptor (ie, the ordering
> that would be produced by my proposed method above).

right! I don't know what's wrong, will debug my changes.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-09-27 Thread Ashutosh Bapat

On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada  wrote:
> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
>  wrote:
>> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada  
>> wrote:
>>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>>>  wrote:
>>>> My original patch added code to manage the files for 2 phase
>>>> transactions opened by the local server on the remote servers. This
>>>> code was mostly inspired from the code in twophase.c which manages the
>>>> file for prepared transactions. The logic to manage 2PC files has
>>>> changed since [1] and has been optimized. One of the things I wanted
>>>> to do is see, if those optimizations are applicable here as well. Have
>>>> you considered that?
>>>>
>>>>
>>>
>>> Yeah, we're considering it.
>>> After these changes are committed, we will post the patch incorporated
>>> these changes.
>>>
>>> But what we need to do first is the discussion in order to get consensus.
>>> Since current design of this patch is to transparently execute DCL of
>>> 2PC on foreign server, this code changes lot of code and is
>>> complicated.
>>
>> Can you please elaborate. I am not able to understand what DCL is
>> involved here. According to [1], examples of DCL are GRANT and REVOKE
>> command.
>
> I meant transaction management command such as PREPARE TRANSACTION and
> COMMIT/ABORT PREPARED command.
> The web page I refered might be wrong, sorry.
>
>>> Another approach I have is to push down DCL to only foreign servers
>>> that support 2PC protocol, which is similar to DML push down.
>>> This approach would be more simpler than current idea and is easy to
>>> use by distributed transaction manager.
>>
>> Again, can you please elaborate, how that would be different from the
>> current approach and how does it simplify the code.
>>
>
> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
> PREPARED to foreign servers that support 2PC.
> With this idea, the client need to do following operation when foreign
> server is involved with transaction.
>
> BEGIN;
> UPDATE parent_table SET ...; -- update including foreign server
> PREPARE TRANSACTION 'xact_id';
> COMMIT PREPARED 'xact_id';
>
> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
> down to foreign server.
> That is, the client needs to execute PREPARE TRANSACTION and
>
> In this idea, I think that we don't need to do followings,
>
> * Providing the prepare id of 2PC.
>   Current patch adds new API prepare_id_provider() but we can use the
> prepare id of 2PC that is used on parent server.
>
> * Keeping track of status of foreign servers.
>   Current patch keeps track of status of foreign servers involved with
> transaction but this idea is just to push down transaction management
> command to foreign server.
>   So I think that we no longer need to do that.

> COMMIT/ROLLBACK PREPARED explicitly.

The problem with this approach is same as one previously stated. If
the connection between local and foreign server is lost between
PREPARE and COMMIT the prepared transaction on the foreign server
remains dangling, none other than the local server knows what to do
with it and the local server has lost track of the prepared
transaction on the foreign server. So, just pushing down those
commands doesn't work.

>
> * Adding max_prepared_foreign_transactions parameter.
>   It means that the number of transaction involving foreign server is
> the same as max_prepared_transactions.
>

That isn't true exactly. max_prepared_foreign_transactions indicates
how many transactions can be prepared on the foreign server, which in
the method you propose should have a cap of max_prepared_transactions
* number of foreign servers.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 973 matches

Mail list logo