Re: [HACKERS] use_remote_estimate usage for join pushdown in postgres_fdw

2016-01-05 Thread Ashutosh Bapat
On Wed, Dec 16, 2015 at 11:41 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > I like option #2.  I don't really have a strong reason for that, but
> > it feels intuitive to me that we err on the side of using remote
> > estimates when in doubt.
>
> If we believe that, why isn't the default value of use_remote_estimate
> true?
> (Maybe it should be.)
>
> Another option that should be considered is that joins should pay
> attention only to the server-level setting and not to the individual
> tables' settings.  This would surely be cheaper to implement, and
> it avoids any questions about whether to OR or AND the individual
> settings.
>
>
To implement server-level setting, we will need to pull it out of
ForeignServer structure like
 442 foreach(lc, fpinfo->server->options)
 443 {
 444 DefElem*def = (DefElem *) lfirst(lc);
 445
 446 if (strcmp(def->defname, "use_remote_estimate") == 0)
 447 fpinfo->use_remote_estimate = defGetBoolean(def);
 ...
 455 }

whereas use_remote_estimate setting for joining relations is readily
available in PgFdwRelationInfo

 58 /* Options extracted from catalogs. */
 59 booluse_remote_estimate;
 60 Costfdw_startup_cost;
 61 Costfdw_tuple_cost;
 62 List   *shippable_extensions;   /* OIDs of whitelisted
extensions */
 ...
 76 } ;

This use_remote_estimate is true if server level option is true or table
level option is true.

ANDing or ORing use_remote_estimate from the joining relations'
PgFdwRelationInfo looks cheaper than pulling it out of ForeignServer
structure.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] use_remote_estimate usage for join pushdown in postgres_fdw

2015-12-16 Thread Robert Haas
On Fri, Dec 11, 2015 at 4:44 AM, Ashutosh Bapat
 wrote:
> Hi All,
> postgres_fdw documentation says following about use_remote_estimate
> (http://www.postgresql.org/docs/devel/static/postgres-fdw.html)
> --
> use_remote_estimate
> This option, which can be specified for a foreign table or a foreign server,
> controls whether postgres_fdw issues remote EXPLAIN commands to obtain cost
> estimates. A setting for a foreign table overrides any setting for its
> server, but only for that table. The default is false.
> --
>
> I am trying to see, how should we use this option in the context of join
> pushdown and for
> that matter any pushdown involving more than one table.
>
> I came up with following arguments
> 1. Foreign base relations derive their use_remote_estimate setting either
> from the server setting or the per table setting. A join between two foreign
> relations should derive its use_remote_estimate setting from the joining
> relations (recursively). This means that we will use EXPLAIN to estimate
> costs of join if "all" the involved base foreign relations have
> use_remote_estimate true (either they derive it from the server level
> setting or table level setting).
>
> 2. Similar to 1, but use EXPLAIN to estimate costs if "any" of the involved
> base foreign relations have use_remote_estimate is true.
>
> 3. Since join between two foreign relations is not a table level phenomenon,
> but a server level phenomenon, we should use server level setting. This
> means that we will use EXPLAIN output to estimate costs of join if the
> foreign server has use_remote_estimate true, irrespective of the setting for
> individual foreign relations involved in that join.
>
> Unfortunately the documentation and comments in code do not say much about
> the intention (i.e. why and how is this setting expected to be used) of this
> setting in the context or server.
>
> The intention behind server level setting is more confusing. It does not
> override table level setting, so it is not intended to be used for a
> prohibitive reason like e.g. server doesn't support EXPLAIN the way it will
> be interpreted locally. It seems to act more like a default in case table
> level setting is absent. User may set table level use_remote_estimate to
> true, if cost of EXPLAIN is very small compared to that of table scan (with
> or without indexes) or adding conditional clauses to the query alters the
> costs heavily that the cost of EXPLAIN itself is justified. But I can be
> wrong about these intentions.
>
> If we go by the above intention behind table level setting, 2nd argument
> makes more sense as the table for which use_remote_estimate is true, can
> change the cost of join heavily because of the clauses in the join and it's
> better to get it from the foreign server than guessing it locally.
>
> Comments/suggestions are welcome.

I like option #2.  I don't really have a strong reason for that, but
it feels intuitive to me that we err on the side of using remote
estimates when in doubt.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] use_remote_estimate usage for join pushdown in postgres_fdw

2015-12-16 Thread Tom Lane
Robert Haas  writes:
> I like option #2.  I don't really have a strong reason for that, but
> it feels intuitive to me that we err on the side of using remote
> estimates when in doubt.

If we believe that, why isn't the default value of use_remote_estimate true?
(Maybe it should be.)

Another option that should be considered is that joins should pay
attention only to the server-level setting and not to the individual
tables' settings.  This would surely be cheaper to implement, and
it avoids any questions about whether to OR or AND the individual
settings.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] use_remote_estimate usage for join pushdown in postgres_fdw

2015-12-16 Thread Robert Haas
On Wed, Dec 16, 2015 at 1:11 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I like option #2.  I don't really have a strong reason for that, but
>> it feels intuitive to me that we err on the side of using remote
>> estimates when in doubt.
>
> If we believe that, why isn't the default value of use_remote_estimate true?
> (Maybe it should be.)
>
> Another option that should be considered is that joins should pay
> attention only to the server-level setting and not to the individual
> tables' settings.  This would surely be cheaper to implement, and
> it avoids any questions about whether to OR or AND the individual
> settings.

That was Ashutosh's option #3.

use_remote_estimate is a pretty expensive option, which is why it's
not on by default.  But if you are willing to spend that effort for a
scan of table A parameterized by a value from table B, it seems likely
to me that you are also willing to spend the effort to accurately cost
a pushed-down join of A and B.  Actually, it seems like it would be
more surprising if you weren't: we're willing to accurately cost
iterating the scan of B, but not pushing the whole join down?  Hmm.

That's an arguable position, of course.

Actually, I think that neither use_remote_estimate nor
!use_remote_estimate is a particularly great option.
!use_remote_estimate produces results that are basically pulled out of
a hat.  use_remote_estimate produces good estimates, but it's pretty
expensive for a planning operation.  I'd like to have some other
alternative, like a local cache of metadata that we can consult when
!use_remote_estimate instead of just making things up, which might
tell us things like what indexes exist on the remote side.  But that's
a different project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] use_remote_estimate usage for join pushdown in postgres_fdw

2015-12-11 Thread Ashutosh Bapat
Hi All,
postgres_fdw documentation says following about use_remote_estimate (
http://www.postgresql.org/docs/devel/static/postgres-fdw.html)
--
use_remote_estimate
This option, which can be specified for a foreign table or a foreign
server, controls whether postgres_fdw issues remote EXPLAIN commands to
obtain cost estimates. A setting for a foreign table overrides any setting
for its server, but only for that table. The default is false.
--

I am trying to see, how should we use this option in the context of join
pushdown and for
that matter any pushdown involving more than one table.

I came up with following arguments
1. Foreign base relations derive their use_remote_estimate setting either
from the server setting or the per table setting. A join between two
foreign relations should derive its use_remote_estimate setting from the
joining relations (recursively). This means that we will use EXPLAIN to
estimate costs of join if "all" the involved base foreign relations have
use_remote_estimate true (either they derive it from the server level
setting or table level setting).

2. Similar to 1, but use EXPLAIN to estimate costs if "any" of the involved
base foreign relations have use_remote_estimate is true.

3. Since join between two foreign relations is not a table level
phenomenon, but a server level phenomenon, we should use server level
setting. This means that we will use EXPLAIN output to estimate costs of
join if the foreign server has use_remote_estimate true, irrespective of
the setting for individual foreign relations involved in that join.

Unfortunately the documentation and comments in code do not say much about
the intention (i.e. why and how is this setting expected to be used) of
this setting in the context or server.

The intention behind server level setting is more confusing. It does not
override table level setting, so it is not intended to be used for a
prohibitive reason like e.g. server doesn't support EXPLAIN the way it will
be interpreted locally. It seems to act more like a default in case table
level setting is absent. User may set table level use_remote_estimate to
true, if cost of EXPLAIN is very small compared to that of table scan (with
or without indexes) or adding conditional clauses to the query alters the
costs heavily that the cost of EXPLAIN itself is justified. But I can be
wrong about these intentions.

If we go by the above intention behind table level setting, 2nd argument
makes more sense as the table for which use_remote_estimate is true, can
change the cost of join heavily because of the clauses in the join and it's
better to get it from the foreign server than guessing it locally.

Comments/suggestions are welcome.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company