Re: [HACKERS] Improving our clauseless-join heuristics

Amit Kapila Wed, 18 Apr 2012 03:34:34 -0700

>> Um ... wasn't that well enough explained already?

Yes, it was well explained and I understood also, but what I wanted to
understand the solution with which you have resolved the problem.

The way I am telling was as below code. 
With this extra paths will get generated, but it will as well consider for
joining c and d in query:
select * from a, b, c, d where a.x = b.y and (a.z = c.c or a.z = d.d)

static void 
make_rels_by_clause_joins(PlannerInfo *root, 
                                                  RelOptInfo *old_rel, 
                                                  ListCell *other_rels) 
{ 
        ListCell   *l; 
 ++       bool        bIsold_relJointoanyother_rel = false; 

        for_each_cell(l, other_rels) 
        { 
                RelOptInfo *other_rel = (RelOptInfo *) lfirst(l); 

                if (!bms_overlap(old_rel->relids, other_rel->relids) && 
                        (have_relevant_joinclause(root, old_rel, other_rel)
|| 
                         have_join_order_restriction(root, old_rel,
other_rel))) 
                { 
 ++                    bIsold_relJointoanyother_rel = true; 
                        (void) make_join_rel(root, old_rel, other_rel); 
                } 
        } 
 ++       /*if old_rel is not able to join with any other rel than try
joining it 
 ++       with other_rels which has join clause.*/ 
 ++        if(bIsold_relJointoanyother_rel == false) 
 ++       { 

 ++                     for_each_cell(l, other_rels) 
 ++                     { 
 ++                          RelOptInfo *other_rel = (RelOptInfo *)
lfirst(l); 

 ++                          if (!bms_overlap(old_rel->relids,
other_rel->relids) && 
 ++                              (has_join_restriction(root,
other_rel)||other_rel->joininfo != NIL)) 
 ++                          { 

 ++                                     (void) make_join_rel(root, old_rel,
other_rel); 
 ++                          } 
 ++                     } 
 ++       }
}
-----Original Message-----
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Wednesday, April 18, 2012 11:59 AM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Improving our clauseless-join heuristics 

Amit Kapila <amit.kap...@huawei.com> writes:
>> I'm afraid I'm still not following you very well.  Perhaps you could
>> submit a proposed patch?

> Before that can you please explain in little more detail (if possible with
> small example) about the idea you have told in original mail : "is there
any
> join clause that both these relations participate in?"

Um ... wasn't that well enough explained already?

I think there are basically two cases.  You can have a join clause that
is immediately useful for joining two relations, say

        select ... from a,b where a.x = b.y;

This is "immediate" in the sense that you can apply it when joining a
to b, regardless of any other relations involved in the query.

Or you can have a case like

        select ... from a,b,c where (a.x + b.y) = c.z;

This clause is not immediately useful for joining any two of the three
relations in the query.  It will be useful when we get to level 3,
particularly so if we chose to join a and b first and there's an index
on c.z.  But we would have had to accept doing a cartesian join of a and
b to arrive at that situation.  In this example, we have no alternative
except to do some cartesian join at level 2 --- but as soon as we add
some more tables and join clauses to the example, we could get
distracted from the possibility that a cartesian join of a and b might
be a good idea.

Given that make_rels_by_joins doesn't (and shouldn't IMO) have any
detailed understanding of the semantics of particular join clauses,
I would not expect it to realize that joining a to b is the most likely
option out of the three possible clauseless joins that are available
at level 2 in this query.  It's going to have to generate all 3, and
then costing at the next level will figure out what's best to do.

However, I think it *does* need to understand that clauses relating
3 or more relations can work like this.  In the code as it stood before
last week, it would actively reject joining a to b if there were any
additional relations in the query.  That's just not right.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving our clauseless-join heuristics

Reply via email to