RE: CBO and cartesian product

2003-10-13 Thread John Kanagaraj
Tim/Dilip,

Unfortunately, as this is an 'Apps' instance, the parameters
DB_FILE_MULTIBLOCK_READ_COUNT should be set to 8 and the
OPTIMIZER_INDEX_CACHING parameter should *not* be set (letting it
default)... This is as per ML Note 216205.1 - non compliance = not
supported.

Since this is a customized report though, you *may* be able to get away with
setting them within the program (or reverting to RULE as a quick fix)

John Kanagaraj
DB Soft Inc
Phone: 408-970-7002 (W)

Listen to great, commercial-free christian music 24x7x365 at
http://www.klove.com

** The opinions and facts contained in this message are entirely mine and do
not reflect those of my employer or customers **

-Original Message-
From: Tim Gorman [mailto:[EMAIL PROTECTED] 
Sent: Saturday, October 11, 2003 11:14 AM
To: Multiple recipients of list ORACLE-L
Subject: Re: CBO and cartesian product


Here is the short answer:
=

   * Set OPTIMIZER_INDEX_CACHING to 90
   * Make sure that DB_FILE_MULTIBLOCK_READ_COUNT is not overly high
   * Also, consider gathering column-level statistics on some of the
 indexed columns involved, especially if the query in question
 uses literal data values on them

Here is the long answer:


Starting in the 8i timeframe, the CBO started borrowing some 
techniques from
data warehouse STAR joins when confronted with any type of query that
traversed two different entity-relationship heirarchies 
starting from the
same table.

Say you have three tables (to keep it simple).  One table is a child
entity to the other two tables, which are both parent entities in ERD
terms.  The CBO detects that both parent tables are much 
smaller than the
child table.

OK, so there is no relationship between the two parent tables 
-- they are
both related only through the large child table.

Now, think about what traditional join methods are possible:

1) start with one of the parent tables as the driving table, do a
   indexed nested-loop range-scan during the join to the 
child table,
   and then perform indexed nested-loop unique-scan during 
the final
   join to the other parent table
2) reverse the order of option #1.  Start with the other parent
   table, join to the child, and then join up to the 
remaining parent
3) start with the child table and join up (via indexed 
unique-scans)
   to the two parent tables

The weak point of both of these options is probably the access 
of the child
table.  Plain and simple, it is difficult to efficiently get 
rows from it.
It is likely that the index supporting the foreign-key 
relationship from
either parent table is not very efficient by itself, resulting 
in a very
expensive range-scan, requiring a massive number of logical 
I/Os and cost
calculated by the CBO.

So, the CBO in 8i started utilizing another option, which 
initially blew my
mind first time I saw it happen.  It was the point which I 
realized that the
CBO was _way_ smarter than humans...

This additional option is to perform a cartesian join between the two
parent tables, to come up with one result set.  Then, using 
the filtered
cartesian result set from that join, the CBO probes into the 
large child
table using the _combined_ keys from both parent tables!

Rather brilliant choice, in most cases.  The cartesian join, despite
everybody's visceral fear of it, is actually rather 
insignificant if the two
parent tables are small.  And it is even smaller if there are good
filtering predicates on those tables in the WHERE clause.

So, instead of having to retrieve rows from the large child 
table using one
or the other of the relatively ineffective indexes supporting 
each foreign
key, the CBO merges and uses both keys, resulting in a far 
more effective
access method into the child table.

So, chances are good that this is the situation you are 
facing.  Is this
correct?  Can you verify the basic relationships between the tables
involved?

So, now the question is:  why did the CBO make the wrong choice?

First, the default setting of the OPTIMIZER_INDEX_CACHING 
parameter (i.e.
0) represents a flaw in the basic costing algorithm used by the CBO.
Setting the parameter to 90 or so fixes this flaw.  For a more detailed
explanation, please feel free to view my paper Search for 
Intelligent Life
in the CBO, available online at http://www.EvDBT.com/papers.htm;.

Changing that alone may cause the CBO to rethink its decision 
to go with the
derived STAR-join scheme involving a cartesian join, and 
instead choose the
indexed nested-loops scheme which is the __only__ possible 
choice by the
RBO.  By discounting the cost of index-based access methods, 
the CBO (which
considers _all_ possible access methods and chooses the one 
with the lowest
cost) may now choose the index-based plan.  Once again, the RBO only
considered the one plan, which in this case turned into a bit 
of luck for
the RBO, making it look good.

You can experiment

Re: CBO and cartesian product

2003-10-11 Thread Tim Gorman
Here is the short answer:
=

   * Set OPTIMIZER_INDEX_CACHING to 90
   * Make sure that DB_FILE_MULTIBLOCK_READ_COUNT is not overly high
   * Also, consider gathering column-level statistics on some of the
 indexed columns involved, especially if the query in question
 uses literal data values on them

Here is the long answer:


Starting in the 8i timeframe, the CBO started borrowing some techniques from
data warehouse STAR joins when confronted with any type of query that
traversed two different entity-relationship heirarchies starting from the
same table.

Say you have three tables (to keep it simple).  One table is a child
entity to the other two tables, which are both parent entities in ERD
terms.  The CBO detects that both parent tables are much smaller than the
child table.

OK, so there is no relationship between the two parent tables -- they are
both related only through the large child table.

Now, think about what traditional join methods are possible:

1) start with one of the parent tables as the driving table, do a
   indexed nested-loop range-scan during the join to the child table,
   and then perform indexed nested-loop unique-scan during the final
   join to the other parent table
2) reverse the order of option #1.  Start with the other parent
   table, join to the child, and then join up to the remaining parent
3) start with the child table and join up (via indexed unique-scans)
   to the two parent tables

The weak point of both of these options is probably the access of the child
table.  Plain and simple, it is difficult to efficiently get rows from it.
It is likely that the index supporting the foreign-key relationship from
either parent table is not very efficient by itself, resulting in a very
expensive range-scan, requiring a massive number of logical I/Os and cost
calculated by the CBO.

So, the CBO in 8i started utilizing another option, which initially blew my
mind first time I saw it happen.  It was the point which I realized that the
CBO was _way_ smarter than humans...

This additional option is to perform a cartesian join between the two
parent tables, to come up with one result set.  Then, using the filtered
cartesian result set from that join, the CBO probes into the large child
table using the _combined_ keys from both parent tables!

Rather brilliant choice, in most cases.  The cartesian join, despite
everybody's visceral fear of it, is actually rather insignificant if the two
parent tables are small.  And it is even smaller if there are good
filtering predicates on those tables in the WHERE clause.

So, instead of having to retrieve rows from the large child table using one
or the other of the relatively ineffective indexes supporting each foreign
key, the CBO merges and uses both keys, resulting in a far more effective
access method into the child table.

So, chances are good that this is the situation you are facing.  Is this
correct?  Can you verify the basic relationships between the tables
involved?

So, now the question is:  why did the CBO make the wrong choice?

First, the default setting of the OPTIMIZER_INDEX_CACHING parameter (i.e.
0) represents a flaw in the basic costing algorithm used by the CBO.
Setting the parameter to 90 or so fixes this flaw.  For a more detailed
explanation, please feel free to view my paper Search for Intelligent Life
in the CBO, available online at http://www.EvDBT.com/papers.htm;.

Changing that alone may cause the CBO to rethink its decision to go with the
derived STAR-join scheme involving a cartesian join, and instead choose the
indexed nested-loops scheme which is the __only__ possible choice by the
RBO.  By discounting the cost of index-based access methods, the CBO (which
considers _all_ possible access methods and chooses the one with the lowest
cost) may now choose the index-based plan.  Once again, the RBO only
considered the one plan, which in this case turned into a bit of luck for
the RBO, making it look good.

You can experiment with this parameter change using ALTER SESSION, if you
like.  This is one of the _few_ occasions on which changing a parameter
actually has an impact on performance.

There are some other parameter settings which may impact how the CBO costs
this query.  For example, if DB_FILE_MULTIBLOCK_READ_COUNT is set higher
than its default value of 8 or 16, then the CBO will think that access plans
involving FULL table scans are cheaper than they are.

Another possible cause for the CBO's bad decision is it's default assumption
that data values in a column are evenly distributed.  Gathering stats for
indexed columns only gathers the number of distinct keys and the low/high
values, by default.  Therefore, the CBO has no choice except to assume an
even distribution of data, that each distinct data value is used by an equal
number of rows.  Gathering column-level statistics creates histograms in
the data dictionary that