Re: [HACKERS] Research/Implementation of Nested Loop Join optimization

Dann Corbit Wed, 23 Jul 2008 13:24:26 -0700

When you install the source tree (e.g. in folder \postgresql-8.3.x) you
will want to examine nodeMergejoin.c typically found in a path similar
to this:


\postgresql-8.3.x\src\backend\executor\nodeMergejoin.c

 

Here are the comments from the version on my machine:

 

/*

 * INTERFACE ROUTINES

 *                            ExecMergeJoin
mergejoin outer and inner relations.

 *                            ExecInitMergeJoin
creates and initializes run time states

 *                            ExecEndMergeJoin
cleans up the node.

 *

 * NOTES

 *

 *                            Merge-join is done by joining the inner
and outer tuples satisfying

 *                            join clauses of the form ((= outerKey
innerKey) ...).

 *                            The join clause list is provided by the
query planner and may contain

 *                            more than one (= outerKey innerKey) clause
(for composite sort key).

 *

 *                            However, the query executor needs to know
whether an outer

 *                            tuple is "greater/smaller" than an inner
tuple so that it can

 *                            "synchronize" the two relations. For
example, consider the following

 *                            relations:

 *

 *                                                            outer: (0
^1 1 2 5 5 5 6 6 7)            current tuple: 1

 *                                                            inner: (1
^3 5 5 5 5 6)                                       current tuple: 3

 *

 *                            To continue the merge-join, the executor
needs to scan both inner

 *                            and outer relations till the matching
tuples 5. It needs to know

 *                            that currently inner tuple 3 is "greater"
than outer tuple 1 and

 *                            therefore it should scan the outer
relation first to find a

 *                            matching tuple and so on.

 *

 *                            Therefore, rather than directly executing
the merge join clauses,

 *                            we evaluate the left and right key
expressions separately and then

 *                            compare the columns one at a time (see
MJCompare).  The planner

 *                            passes us enough information about the
sort ordering of the inputs

 *                            to allow us to determine how to make the
comparison.  We may use the

 *                            appropriate btree comparison function,
since Postgres' only notion

 *                            of ordering is specified by btree
opfamilies.

 *

 *

 *                            Consider the above relations and suppose
that the executor has

 *                            just joined the first outer "5" with the
last inner "5". The

 *                            next step is of course to join the second
outer "5" with all

 *                            the inner "5's". This requires
repositioning the inner "cursor"

 *                            to point at the first inner "5". This is
done by "marking" the

 *                            first inner 5 so we can restore the
"cursor" to it before joining

 *                            with the second outer 5. The access method
interface provides

 *                            routines to mark and restore to a tuple.

 *

 *

 *                            Essential operation of the merge join
algorithm is as follows:

 *

 *                            Join {

 *                                            get initial outer and
inner tuples
INITIALIZE

 *                                            do forever {

 *                                                            while
(outer != inner) {
SKIP_TEST

 *
if (outer < inner)

 *
advance outer
SKIPOUTER_ADVANCE

 *
else

 *
advance inner
SKIPINNER_ADVANCE

 *                                                            }

 *                                                            mark inner
position
SKIP_TEST

 *                                                            do forever
{

 *
while (outer == inner) {

 *
join tuples
JOINTUPLES

 *
advance inner position
NEXTINNER

 *
}

 *
advance outer position
NEXTOUTER

 *
if (outer == mark)
TESTOUTER

 *
restore inner position to mark                   TESTOUTER

 *
else

 *
break    // return to top of outer loop

 *                                                            }

 *                                            }

 *                            }

 *

 *                            The merge join operation is coded in the
fashion

 *                            of a state machine.  At each state, we do
something and then

 *                            proceed to another state.  This state is
stored in the node's

 *                            execution state information and is
preserved across calls to

 *                            ExecMergeJoin. -cim 10/31/89

 */

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Manoel Henrique
Sent: Wednesday, July 23, 2008 1:17 PM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] Research/Implementation of Nested Loop Join
optimization

 

Hi! I`m a researcher from PUC-Rio (Brazil) and we`re studying about an
Joins, and we`d like to implement an optimization on the Nested Loop
Join, this optimization consists on while scanning the inner table, the
iteration would go from up-down then backwards(down-up) to take
advantage of the buffer pages in memory.

We`d work with MaterialScan and only NestedLoop (we`re dropping all
indexes and keys to make it this way). The research objective is to show
some students how a DBMS works.


Does PostgreSQL already works this way?

Is it possible to implement such thing? Is it easy? how hard?

 

 

Thank you in advance,

Manoel Henrique Souza.

Re: [HACKERS] Research/Implementation of Nested Loop Join optimization

Reply via email to