Re: [HACKERS] Getting sorted data from foreign server for merge join

Corey Huinker Sat, 07 Nov 2015 09:08:55 -0800

On Fri, Nov 6, 2015 at 11:32 AM, Robert Haas <[email protected]> wrote:


> On Thu, Nov 5, 2015 at 11:54 PM, Ashutosh Bapat
> <[email protected]> wrote:
> > Hi All,
> > PFA patch to get data sorted from the foreign server (postgres_fdw)
> > according to the pathkeys useful for merge join.
> >
> > For a given base relation (extendable to join when that becomes
> available in
> > postgres_fdw), the patch tries to find merge joinable clauses. It then
> adds
> > paths with pathkeys extracted out of these merge joinable clauses. The
> merge
> > joinable clauses form equivalence classes. The patch searches in
> > root->eq_classes for equivalence members belonging to the given relation.
> > For every such expression it creates a single member pathkey list and
> > corresponding path. The test postgres_fdw.sql has an existing join which
> > uses merge join. With this patch the data is sorted on the foreign server
> > than locally.
> >
> > While mergejoinable clauses can be obtained from rel->joininfo as well.
> But
> > rel->joininfo contains other clauses as well and we need extra efforts to
> > remove duplicates if the same expression appears in multiple merge
> joinable
> > clauses.
> >
> > Two joining relations can have multiple merge joinable clauses, requiring
> > multi-member pathkeys so that merge join is efficient to the maximum
> extent.
> > The order in which the expressions appears in pathkeys can change the
> costs
> > of sorting the data according to the pathkeys, depending upon the
> > expressions and the presence of indexes containing those expressions.
> Thus
> > ideally we would need to club all the expressions appearing in all the
> > clauses for given two relations and create paths with pathkeys for every
> > order of these expressions.That explodes the number of possible paths. We
> > may restrict the number of paths created by considering only certain
> orders
> > like sort_inner_and_outer(). In any case, costing such paths increases
> the
> > planning time which may not be worth it. So, this patch uses a heuristic
> > approach of creating single member pathkeys path for every merge joinable
> > expression.
> >
> > The pathkeys need to be canonicalised using make_canonical_pathkey(),
> which
> > is a static function. I have added a TODO and comments in the patch
> > explaining possible ways to avoid "extern"alization of this function.
> >
> > Comments/suggestions are welcome.
>
> I think this approach is generally reasonable, but I suggested parts
> of it, so may be biased.  I would be interested in hearing the
> opinions of others.
>
> Random notes:
>
> "possibily" is a typo.
>
> usable_pklist is confusing because it seems like it might be talking
> about primary keys rather than pathkeys.  Also, I realize now, looking
> at this again, that you're saying "usable" when what I really think
> you mean is "useful".  Lots of pathkeys are usable, but only a few of
> those are useful.  I suggest renaming usable_pathkeys to
> query_pathkeys and usable_pklist to useful_pathkeys.  Similarly, let's
> rename generate_pathkeys_for_relation() to
> get_useful_pathkeys_for_relation().
>
> Although I'm usually on the side of marking things as extern whenever
> we find it convenient, I'm nervous about doing that to
> make_canonical_pathkey(), because it has side effects.  Searching the
> list of canonical pathkeys for the one we need is reasonable, but is
> it really right to ever think that we might create a new one at this
> stage?  Maybe it is, but if so I'd like to hear a good explanation as
> to why.
>
>  Is the comment "Equivalence classes covering relations other than the
> current one are of interest here" missing a "not"?
>
> I don't find this comment illuminating:
>
> +         * In case of child relation, we need to check that the
> +         * equivalence class indicates a join to a relation other than
> +         * parents, other children and itself (something similar to
> above).
> +         * Otherwise we will end up creating useless paths. The code
> below is
> +         * similar to generate_implied_equalities_for_column(), which
> might
> +         * give a hint.
>
> That basically just says that we have to do it this way because the
> other way would be wrong.  But it doesn't say WHY the other way would
> be wrong. Then a few lines later, you have another comment which says
> the same thing again:
>
> +                /*
> +                 * Ignore equivalence members which correspond to children
> +                 * or same relation or to parent relations
> +                 */
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Sorry to barge in late, but I was wondering if what we've learned with this
patch can be applied to the case of asserting a sort order on a query
returning from dblink().

This is a common use-case for us where one machine will issue a query
request to a non-Pg database that happens to speak libpq (Vertica,
Redshift, I guess Greenplum would be one too), but for whom foreign data
wrappers aren't yet an option, and then join that data to local tables,
many of which have indexes matching the sort order that I know is coming
from the dblink(), but have no way of telling that to the query planner.

I'm guessing the answer is "no", but I wanted to make you aware of a
similar real-world need.

Re: [HACKERS] Getting sorted data from foreign server for merge join

Reply via email to